It would be exciting to test this patch. The current P-state in '5.19.7-zen2-1-zen' stutters really bad while gaming (corectrl ondemand). GPU usage sometimes drop from 99% to 60% (which doesn't happen with ACPI), most likely because of the ocassional CPU performance drops. It's like it's all over the place. Some times you get massive performance peaks, and sometimes performance drop at the wrong times (tested with Apex Legends).
Announcement
Collapse
No announcement yet.
AMD Posts "P-State EPP" Driver As New Attempt To Improve Performance-Per-Watt On Linux
Collapse
X
-
Originally posted by Zeioth View PostIt would be exciting to test this patch. The current P-state in '5.19.7-zen2-1-zen' stutters really bad while gaming (corectrl ondemand). GPU usage sometimes drop from 99% to 60% (which doesn't happen with ACPI), most likely because of the ocassional CPU performance drops. It's like it's all over the place. Some times you get massive performance peaks, and sometimes performance drop at the wrong times (tested with Apex Legends).
- Likes 3
Comment
-
Originally posted by Linuxxx View Post
Since you are experiencing stutters with ondemand, why not give the performance governor with amd-pstate a try and see if that fixes the stuttering problem in your case?
- Likes 4
Comment
-
Originally posted by Zeioth View Post
That actually fixed the issue! It makes total sense. I guess both corectrl/gamemode were designed for ACPI. Thank you so much for taking the time to help me. Still, excited to discover the new changes on the patch. I'll keep an eye on it.
- Likes 1
Comment
-
Originally posted by Linuxxx View Post
And the performance governor would certainly improve the performance even further!
Or at the very least, there wouldn't be any degredation...
It says the highest performance is delivered by amd-pstate+ondemand, beating the performance governor by 13.5%. No idea what could cause that.
But I don't trust any of this. The units in the table are wrong -- there should be no time units in performance / watt, which cancels to computations per joule. The goal they are optimizing for is unclear -- you can always increase perf/W by reducing frequency, but for a batch job that runs to completion, reducing frequency will reduce performance too. The mission of a cpufreq governor is to identify workloads that do not run to completion but instead are throttled by external causes like timers, network throughput, or display refreshes, and reduce the frequency as much as possible without causing those workloads to miss their latency targets. Spend more time computing, and less time waiting.
I found this patch series that adds runners for "tbench" and "gitsource". It looks like "tbench" runs tbench with 128 client processs and the server on the same host, and "gitsource" runs "make test" on the git source code. The examples in the documentation have the units fixed, but these both seem like "batch job that runs to completion" type workloads. There is no source of external throttling. So, IMO, any difference between governors here indicates at least one of the governors is wrong. But tbench sounds pathologically difficult for cpufreq governors -- a TCP connection might normally be an external throttle, but if the other end is localhost you have a serial multithreaded load where the serialization is substantially hidden from the scheduler.
Honestly it feels like the team AMD has on cpufreq are throwing things at the wall to see what sticks. They need to go to lunch with the people who designed the algorithms in the PMU that the EPP register is affecting, and work out a sound theoretical foundation for what CPU frequency scaling is supposed to do and what their figure of merit is.
And the proper solution probably requires teaching schedutil about serial multithreaded load, somehow.Last edited by yump; 14 September 2022, 12:51 PM.
- Likes 2
Comment
-
Originally posted by yump View Post
Look at the 2nd table from the mailing list post. It's screenshotted in the article.
It says the highest performance is delivered by amd-pstate+ondemand, beating the performance governor by 13.5%. No idea what could cause that.
But I don't trust any of this. The units in the table are wrong -- there should be no time units in performance / watt, which cancels to computations per joule. The goal they are optimizing for is unclear -- you can always increase perf/W by reducing frequency, but for a batch job that runs to completion, reducing frequency will reduce performance too. The mission of a cpufreq governor is to identify workloads that do not run to completion but instead are throttled by external causes like timers, network throughput, or display refreshes, and reduce the frequency as much as possible without causing those workloads to miss their latency targets. Spend more time computing, and less time waiting.
I found this patch series that adds runners for "tbench" and "gitsource". It looks like "tbench" runs tbench with 128 client processs and the server on the same host, and "gitsource" runs "make test" on the git source code. The examples in the documentation have the units fixed, but these both seem like "batch job that runs to completion" type workloads. There is no source of external throttling. So, IMO, any difference between governors here indicates at least one of the governors is wrong. But tbench sounds pathologically difficult for cpufreq governors -- a TCP connection might normally be an external throttle, but if the other end is localhost you have a serial multithreaded load where the serialization is substantially hidden from the scheduler.
Honestly it feels like the team AMD has on cpufreq are throwing things at the wall to see what sticks. They need to go to lunch with the people who designed the algorithms in the PMU that the EPP register is affecting, and work out a sound theoretical foundation for what CPU frequency scaling is supposed to do and what their figure of merit is.
And the proper solution probably requires teaching schedutil about serial multithreaded load, somehow.
But for schedutil to advance, Linux kernel hackers would need to actually spent time working on that cursed governor!
At least from the outside, looks like it's anything but alive...
Comment
-
Originally posted by Linuxxx View PostI wonder what this says about the quality of schedutil...
Benchmarks on ?AT I think? show the 7000 series at *65W* generally running at close to 100% of "full" performance in ST loads, and 80%+ in MT ones; and at 105W those margins shrink even more - but it's a BIOS-level change, so it needs a reboot. That's what's most interesting to me about this: that we might, potentially, someday, get a governor that can be told to idle *hard* most of the time, but then be told to favor performance via a trivial sysctl when doing "real" work or playing a game etc.
Essentially, ondemand, except with *explicit* control rather than driver guesswork. If it happens to transition better than ondemand, great, but if not then I still wouldn't really care much, because I *know* when I need peak performance and when I don't, and those transitions are for extended periods of time, not just a fraction of a second to parse a web page etc, so even 500ms each time would still be perfectly acceptable.
- Likes 1
Comment
-
Originally posted by arQon View PostBenchmarks on ?AT I think? show the 7000 series at *65W* generally running at close to 100% of "full" performance in ST loads, and 80%+ in MT ones; and at 105W those margins shrink even more - but it's a BIOS-level change, so it needs a reboot.
And it doesn't have to be a BIOS-level change. I'm pretty sure you can tweak it under Windows with AMD's whiz-bang GUI, and the equivalent Intel feature is exposed in /sys/class/powercap/. AMD just needs to hook theirs up.
That's what's most interesting to me about this: that we might, potentially, someday, get a governor that can be told to idle *hard* most of the time, but then be told to favor performance via a trivial sysctl when doing "real" work or playing a game etc. Essentially, ondemand, except with *explicit* control rather than driver guesswork. If it happens to transition better than ondemand, great, but if not then I still wouldn't really care much, because I *know* when I need peak performance and when I don't, and those transitions are for extended periods of time, not just a fraction of a second to parse a web page etc, so even 500ms each time would still be perfectly acceptable.
- Likes 1
Comment
Comment