Originally posted by stormcrow
View Post
Announcement
Collapse
No announcement yet.
AMDGPU Linux Driver No Longer Lets You Have Unlimited Control To Lower Your Power Limit
Collapse
X
-
- Likes 2
-
Originally posted by yump View PostYou apparently fell asleep in, "last 10 years of DVFS literature review". The power limit doesn't change the voltage applied to the chip or increase the current through it for any particular clock frequency. It only controls which frequency/voltage pairs the chip is allowed to choose for a given level of utilization.
Comment
-
Originally posted by Anux View PostI don't know how to explain it better and also know no good resource that talks about this. Maybe someone else does?
So if I now understood correctly, the driver sets FV greedily to have the lowest capacitive load at any given tdp.
Is that it?
If so, all of this seem to me like a https://en.wikipedia.org/wiki/XY_problem.
People actually want to constrain their gpus to not go above a FV pair n. Not limit the tdp of FV pair n+1.
The patch, if I understood correctly from what you showed, limits how low the tdp can be for (in your example) FV7. People want their boards to consume less, so constraining or to FV6 would seem to me (the layman) like a better approach.
Or where have I gone wrong?
Comment
-
Originally posted by DumbFsck View Post
OK, I'm sorry then, as I said, I'm very unfamiliar with these systems.
So if I now understood correctly, the driver sets FV greedily to have the lowest capacitive load at any given tdp.
Is that it?
If so, all of this seem to me like a https://en.wikipedia.org/wiki/XY_problem.
People actually want to constrain their gpus to not go above a FV pair n. Not limit the tdp of FV pair n+1.
The patch, if I understood correctly from what you showed, limits how low the tdp can be for (in your example) FV7. People want their boards to consume less, so constraining or to FV6 would seem to me (the layman) like a better approach.
Or where have I gone wrong?
People want their boards not to consume above a given TDP, but to maximize performance within that envelope. A high frequency gives better performance, but requires higher voltage, the actual power consumed depends upon what the GPU is doing (the workload). If the power consumed (or current) at a certain Frequency and Voltage is too high (exceeds the TDP) a lower Frequency/Voltage pair must be used, if there is headroom a higher F/V can be used. The driver typically monitors the GPU load and sets the F/V to a value where it can perform the workload while minimizing the power consumption, this is the DPM function in the amdgpu driver, but it's constrained by the TDP.
- Likes 1
Comment
-
Originally posted by DumbFsck View PostOK, I'm sorry then
Originally posted by s_j_newbury View PostPeople want their boards not to consume above a given TDP, but to maximize performance within that envelope. A high frequency gives better performance, but requires higher voltage, the actual power consumed depends upon what the GPU is doing (the workload). If the power consumed (or current) at a certain Frequency and Voltage is too high (exceeds the TDP) a lower Frequency/Voltage pair must be used, if there is headroom a higher F/V can be used. The driver typically monitors the GPU load and sets the F/V to a value where it can perform the workload while minimizing the power consumption, this is the DPM function in the amdgpu driver, but it's constrained by the TDP.
Working with F/V-pairs directly would give vastly different results from low performance to high power as my table hopefully visualizes.
Under volting can be used in conjunction but is not needed and has the potential of instabilities.
Comment
-
It's not impossible that the claim about potential hardware damage is true. Newer chips (RDNA1+, I think) use a parametric curve for frequency-voltage instead of a handful of defined pairs. It's conceivable that a parametric curve might produce wild values outside its intended domain of application, which work for low utilization but could be dangerous otherwise. Or a VRM design might assume that low voltages would only correspond to low utilization, and use that to select the number of enabled phases. (But such a design might blow up in Furmark too.) And of course AIB partners are probably only testing the stock values and maybe overclocking.
But if AMD is going to enforce a minimum power limit for some reason, that reason needs to be publicly stated on the mailing list and in the commit messages in a way that is understandable to someone educated in electrical engineering and explains the nature of the legitimate problem with too-low power limit. It must be clear that 1) this constraint prevents a legitimate problem, and 2) the necessity has been reviewed by the hardware team with awareness that users don't like it.
What has been posted to the list so far is compatible with them just following documentation and what Windows does. Christian König doesn't seem to have cottoned on that it's about power and not voltage.
The concern about bugs reported by users running in undefined configurations is legitimate, but there is an established solution to that that should be applied to all overclocking/undervolting interfaces the kernel exposes: taint the kernel. I'd even favor going further and tainting the kernel if XMP is enabled, if it's easy enough to detect that.
Anux intelfx and DumbFsck
Undervolting does increase amperage, actually, for chips that typically run against the power limit. Undervolting reduces the amount of current at any given frequency, but the firmware will then choose a higher frequency to get back to the limit. And since P=I*V, I=P/V. Firware holds power constant, so reducing voltage increases current.
P=I*V is first principles physics and applies to all devices, semiconductor or otherwise. The f*C*V^2 power formula is just P=IV, with a substitution. The charge on a capacitor is CV, and that amount of charge is drawn through the voltage Vdd every time a gate turns on. Current is charge per second, and f is the the number of cycles per second. I=fCV. Some of the energy is dissipated in the on-cycle, and some in the off-cycle, but that doesn't affect the result. (In fact from outside the chip, capacitors in the PDN filter out clock-frequency current variation almost completely, so a big ASIC just looks like a resistor that changes value very quickly with what the workload is doing.)
- Likes 1
Comment
-
Originally posted by yump View PostIt's not impossible that the claim about potential hardware damage is true. Newer chips (RDNA1+, I think) use a parametric curve for frequency-voltage instead of a handful of defined pairs. It's conceivable that a parametric curve might produce wild values outside its intended domain of application, which work for low utilization but could be dangerous otherwise.
Or a VRM design might assume that low voltages would only correspond to low utilization, and use that to select the number of enabled phases
(But such a design might blow up in Furmark too.)
And of course AIB partners are probably only testing the stock values and maybe overclocking.
And what about booting with amdgpu.ppfeaturemask=0xffffffff or any other combination? If that is allowed without kernel recompile there is no argument for limiting the min TDP.
But if AMD is going to enforce a minimum power limit for some reason, that reason needs to be publicly stated on the mailing list and in the commit messages in a way that is understandable to someone educated in electrical engineering and explains the nature of the legitimate problem with too-low power limit.
The concern about bugs reported by users running in undefined configurations is legitimate
but there is an established solution ... taint the kernel
Undervolting does increase amperage
Sorry for being a bit too sarcastic. :/ Until recently there was no doubt that my next GPU would be an AMD card, now that I know I couldn't use it in Windows (there's no way I'm running a card above 150 W for extended periods) I have lost faith in them and am seriously considdering Intel GPUs.
I think I got their memo, they hate power efficiency and custom builds and want us to buy competitor cards. Intel seems to have no problem going down to 95 W in Win: https://www.kitguru.net/wp-content/u...-oc-scaled.jpg and I noticed they are really working hard to make their drivers good/usable. Maybe 2 more generations and we won't ever have to think about radeon again.
I can only hope this doesn't spread to their CPU lineup.
Comment
-
Originally posted by Anux View Post
Sorry for being a bit too sarcastic. :/ Until recently there was no doubt that my next GPU would be an AMD card, now that I know I couldn't use it in Windows (there's no way I'm running a card above 150 W for extended periods) I have lost faith in them and am seriously considdering Intel GPUs.
I think I got their memo, they hate power efficiency and custom builds and want us to buy competitor cards. Intel seems to have no problem going down to 95 W in Win: https://www.kitguru.net/wp-content/u...-oc-scaled.jpg and I noticed they are really working hard to make their drivers good/usable. Maybe 2 more generations and we won't ever have to think about radeon again.
I can only hope this doesn't spread to their CPU lineup.
Comment
-
Originally posted by yump View PostP=I*V is first principles physics and applies to all devices, semiconductor or otherwise. The f*C*V^2 power formula is just P=IV, with a substitution.
But as I said, I'm a layman and the last time I read anything about it was more than a decade ago.
But I do have a nagging question that is really not clear. So stop on the paragraph where my misunderstanding is found.
There is a point in video card's power consumption where it becomes too inefficient. (requires a greater increase in power for less performance gain than what the user wants)
People want to set power1_cap at the point of the efficiency curve at which they are satisfied with the returns, therefore avoiding further diminishing.
The way power1_cap_min used to work was to let the gpu do whatever it wanted up to the set wattage, so the user did not have to know the clock, power mode, voltage, etc.
The question I have now is: is it still possible to limit tdp, but in a less transparent way?
People in the bug report mentioned you can set SCLK, after talking to you guys I looked the documentation and there apparently you can set VDD curve, pp tables and acrivate/deactivate performance levels, and set manual performance levels.
A lot of pp_dpm*.
Soiisn't the end result the same, just using a different interface?
In the bug reports I also read people using tuxclocker and corectl, for people using these "front ends" would it be possible for the presentation to stay the same, but then the software doing these other methods to set the desired tdp?
BTW, are the *od_* API calls od because of overdrive and overdrive being some marketing name amd gave to overclock and then if you use od your kernel is already tainted?
Cause I see what you're saying about XMP/DOCP or whatever it's called.
All of this situation does not add up. Sure they are setting the pcap to respect what their AIB partners write in their bios. But if you can get the exact same state through other means, this seems like change for the sake of change. And if od does not taint the kernel, it is much easier for people to get undefined behaviour by using it than by using pcapmin
Comment
Comment