Originally posted by torsionbar28
View Post
AMDGPU Linux Driver No Longer Lets You Have Unlimited Control To Lower Your Power Limit
Collapse
X
-
Originally posted by DumbFsck View Postlook again in the mailing list, how many people want to be able to "write" the values in the driver, instead of it being read only.
And yes people want to be able to write to or ignore min_power_cap in order to get back the old functionality.
This adds more VID and FID pairs to what would fall in like a "max" class.
Lets extend my example with RX 480 and it's 7 predefined FV-pairs (with perfect cooling so we never hit a temp limit):
I'm running on standard power limit (175 W), at idle it draws 10 W and stays at the lowest FV-pair. At full load with some super simple load it stays at its highest FV-pair and draws ~150 W (still below firmware standard power limit). If I run furmark my card switches between FV states 3 to 5 while staying at 175 W (it is now power limited).
If I where to reduce the power limit to lets say 100 W, my idle case doesn't change. The light load scenario would now be power limited at 100 W and switch between FV-pairs 5 to 7. With furmark it would still be at 100 W but use FV-pairs 1 to 3.
There are no new pairs created the card just uses different predefined pairs to assure my intended power limit. No under volting either.
If I could ignore max_power_cap, furmark would run at FV-7 and the card would draw maybe 400 W and something would die for sure (before TDP limits were a thing people actually fried their cards with furmark). But still no under/over volting or new FV-pairs.
But no one requested a change to max_power_cap in that bug report.
The point is also easily made clear if you look at laptops, they use the same chips with much lower power limits and everything works fine, no damage to those mobile GPUs or even any errors.
Comment
-
-
Originally posted by Anux View PostNo, how could this add new FID/VID pairs? Where would those values come from?
Lets extend my example with RX 480 and it's 7 predefined FV-pairs (with perfect cooling so we never hit a temp limit):
[...]
If I where to reduce the power limit to lets say 100 W, my idle case doesn't change. The light load scenario would now be power limited at 100 W and switch between FV-pairs 5 to 7. With furmark it would still be at 100 W but use FV-pairs 1 to 3.
There are no new pairs created the card just uses different predefined pairs to assure my intended power limit. No under volting either
I didn't say a new FV pair, I said add more FV pairs to what the car lists as max tdp.
So if you have a flag of maxtdp = FV-6 and FV-7, and you change the lower max bound to FV-5, now there is a "new" pair that "falls in like a max class" as per my wording.
So not a new pair overall, the group of pairs tagged as "max" now include a new one, it already exists, but it is new to this section.
Now, for what issues might arise from this, as I alluded to, only messing and searching through the codebase to see where do they need it (need the max lower and maximum bounds) and make assumptions. Hypothetically (as in, taken out of my ass) they could be making the card stay at some state by turning power distribution on and off (so if your vrm has operating voltages of 1 and 2 and you have 2 vrms, the only way to get 1 is to turn one of the vrms off) so for FV-n that needs 1, they have one vrm off, and for FV-max they assume that it is always going to be with all vrms in, so the lower bound is 2, and somewhere in the firmware they read "max" and make whatever else assume both vrms are on, while because we included FV-n to the "max" category, it isn't true anymore, so undefined behaviour might arise. (numbers are obviously arbitrarily selected for ease of exemplification) (we can posit that wherever else they want to get the FV they should get the FV instead of assuming from being in "max", but then in this example it would be an issue of architecture, probably) (again, this is a hypothetical, any semblance to reality would be accidental, so don't quote me as fact, it is all just an autistic work of fiction).
Comment
-
-
Originally posted by DumbFsck View PostI didn't say a new FV pair, I said add more FV pairs to what the car lists as max tdp.
So if you have a flag of maxtdp = FV-6 and FV-7, and you change the lower max bound to FV-5, now there is a "new" pair that "falls in like a max class" as per my wording.
Maybe if we look from another angle it becomes more clear. Let's assume we have no powerlimit and fix our card to FV-5 (1 GHz at 1V). On idle we would draw 20 W, with a simple load we would draw 80 to 100 W and with furmark we might draw 150 - 200 W.
You could build a table for each FV value and load scenario:Now whenever there is a TDP limit you only select the highest possible FV value depending on your current workload. The firmware does this with messurements instead of tables and therefore you get overshoots above your requested TDP (that's why benchmarks show such high peak power spikes) but it averages to your requested TDP.idle light load furmark FV-1 10 W 30 W 50 W ... ... ... ... FV-6 35 W 130 W 350 W FV-7 40 W 150 W 400 W
Edit: each workload varies heavily over time in how much power it takes for a fixed FV-pair
I don't know how to explain it better and also know no good resource that talks about this. Maybe someone else does?Last edited by Anux; 05 March 2024, 07:07 PM.
Comment
-
-
Originally posted by kiffmet View PostAMDs argumentation is nonsensical given that identical chips that are used in notebooks can run at much lower power without entering self-destruct mode.
Comment
-
-
Originally posted by Rabiator View PostGood point. Also, AMD has an official Eco-Mode for some Zen4 CPUs. About 30% less energy consumption at the price of sacrificing 10% of performance.
Having this cap at 6% is just a bad joke, making the feature essentially useless.
Seems like I have to take a look at intel for my next purchase, does anyone know how the situation is with arc cards?
Comment
-
-
Originally posted by Nille View Post
You could exploit the low power states maybe. for example you can bypass security processors buy underpower them like in the switch or the exploit that was presented for the tesla board computers. the low power state could let the DMA engine do funky stuff like writing in memory regions that they normal should never write data to. for example the issue where the kernel let write in the EFI store and brik some laptops in the paste.
There could be a hole lot that can go wrong in that case. but i also would like that AMD and the AIB check how low you can go without issue and let that happen. its annoy me to see that the gpu core and memory clocks ramp up for just watching a h264 720p video clip.
Comment
-
-
Originally posted by stormcrow View Post...some commenters apparently fell asleep in high school science classes...
Basic high school electricity. Under voltage beyond tolerance will kill electronics nearly as fast as an over voltage (where you get arcing). Under voltage increases your amperage to meet the basic power levels required by the electronics. Electronics are rated to a certain voltage, but more importantly, to a certain amperage. When that amperage is exceeded Bad Things happen. Ever wondered why high amperage extension cords are much larger and more expensive than low amperage for the same voltage? (Look it up.) Additionally left to your education, find out why weak(ening) PSUs often scorch power traces on connected boards. (Hint: Power (Watts) = V (voltage) x I (current or amperage) )
Don't expect this to ever be reverted. It was an oversight/bug to begin with.
That seems like an overly simplified, and incomplete thought., here is my thinking, I'm not saying I'm 0100% right, but i'm pretty sure that my reasoning is closer to the actual behavior :
You are right, P = V*I
But I, first and foremost.... I = V /R
in a fixed resistance scenario, lowering the voltage always lower the power, and this is what is applied, and seen when people undervolt their graphic cards, the power draw always lower at matching frequency, and that's how physics work.
But....but here we are talking about power limiting ! and the behavior is much, much different from undervolting.
In power limited scenarios (even at GPU specified power) the GPU is always adjusting its point on the V/F curve so that the power budget i respected, the power draw is then controlled more by the driver dealing with the physics behind rather than the physics as the primary limiter.
What I mean here is that in non power/temp limited scenario, the GPU power consumption will be limited by its apparent resistance at its working frequency, but that's not the case in power/temp limited scenario
But appart from crashes and unstability, I don't see how running at lower power budget would physically kill a GPU
And weak PSU are a completly different matter, because a weak PSU will have its Voltage drop, but from my point of view its behavior is more of an "infinite current" source (not infinite per-se, but it can supply a lot of amps), then the "9V instead of 12V" if fed in the converter of the board, then the board will step it down to 1.5v or less, and that's where the problem is, because indeed 150W@12V is 12.A, and 150W@9V is 16A, and here, you are in a situation where the lower voltage causes an increase in current, because the load is the same, and the source is controlled and can supply more current (the "source equivalent resistance" can vary), but when you are on the other side of the regulator( on the 1.5V side), the problem doesn't apply because the load doesn't change (the sink equivalent resistance doesn't vary)
Comment
-
-
Originally posted by stormcrow View Post...some commenters apparently fell asleep in high school science classes...
Basic high school electricity. Under voltage beyond tolerance will kill electronics nearly as fast as an over voltage (where you get arcing). Under voltage increases your amperage to meet the basic power levels required by the electronics. Electronics are rated to a certain voltage, but more importantly, to a certain amperage. When that amperage is exceeded Bad Things happen. Ever wondered why high amperage extension cords are much larger and more expensive than low amperage for the same voltage? (Look it up.) Additionally left to your education, find out why weak(ening) PSUs often scorch power traces on connected boards. (Hint: Power (Watts) = V (voltage) x I (current or amperage) )
Don't expect this to ever be reverted. It was an oversight/bug to begin with.
Comment
-
Comment