Announcement

**Zeioth** · 10 September 2022, 02:03 PM

It would be exciting to test this patch. The current P-state in '5.19.7-zen2-1-zen' stutters really bad while gaming (corectrl ondemand). GPU usage sometimes drop from 99% to 60% (which doesn't happen with ACPI), most likely because of the ocassional CPU performance drops. It's like it's all over the place. Some times you get massive performance peaks, and sometimes performance drop at the wrong times (tested with Apex Legends).

**Linuxxx** · 10 September 2022, 02:11 PM

Originally posted by Zeioth View Post

It would be exciting to test this patch. The current P-state in '5.19.7-zen2-1-zen' stutters really bad while gaming (corectrl ondemand). GPU usage sometimes drop from 99% to 60% (which doesn't happen with ACPI), most likely because of the ocassional CPU performance drops. It's like it's all over the place. Some times you get massive performance peaks, and sometimes performance drop at the wrong times (tested with Apex Legends).

Since you are experiencing stutters with ondemand, why not give the performance governor with amd-pstate a try and see if that fixes the stuttering problem in your case?

**Zeioth** · 10 September 2022, 05:46 PM

Originally posted by Linuxxx View Post

Since you are experiencing stutters with ondemand, why not give the performance governor with amd-pstate a try and see if that fixes the stuttering problem in your case?

That actually fixed the issue! It makes total sense. I guess both corectrl/gamemode were designed for ACPI. Thank you so much for taking the time to help me. Still, excited to discover the new changes on the patch. I'll keep an eye on it.

**gfunk** · 11 September 2022, 04:02 AM

Originally posted by Zeioth View Post

That actually fixed the issue! It makes total sense. I guess both corectrl/gamemode were designed for ACPI. Thank you so much for taking the time to help me. Still, excited to discover the new changes on the patch. I'll keep an eye on it.

do you launch the game with the gamemoderun arguement? I thought gamemode sets the governor to performance

**Zeioth** · 11 September 2022, 08:06 AM

Originally posted by gfunk View Post

do you launch the game with the gamemoderun arguement? I thought gamemode sets the governor to performance

No no,I don't! I was just wondering. But that's another very good point.

**cj.wijtmans** · 13 September 2022, 07:06 PM

Performaner per wat is more important to me. CPuUs are so good today i dont need performance at all cost. Even if i compile a lot. Most of the compile time solutions are software and not hardware and secondly i am patient.

**yump** · 14 September 2022, 12:49 PM

Originally posted by Linuxxx View Post

And the performance governor would certainly improve the performance even further!

Or at the very least, there wouldn't be any degredation...

Look at the 2nd table from the mailing list post. It's screenshotted in the article.

It says the highest performance is delivered by amd-pstate+ondemand, beating the performance governor by 13.5%. No idea what could cause that.

But I don't trust any of this. The units in the table are wrong -- there should be no time units in performance / watt, which cancels to computations per joule. The goal they are optimizing for is unclear -- you can always increase perf/W by reducing frequency, but for a batch job that runs to completion, reducing frequency will reduce performance too. The mission of a cpufreq governor is to identify workloads that do not run to completion but instead are throttled by external causes like timers, network throughput, or display refreshes, and reduce the frequency as much as possible without causing those workloads to miss their latency targets. Spend more time computing, and less time waiting.

I found this patch series that adds runners for "tbench" and "gitsource". It looks like "tbench" runs tbench with 128 client processs and the server on the same host, and "gitsource" runs "make test" on the git source code. The examples in the documentation have the units fixed, but these both seem like "batch job that runs to completion" type workloads. There is no source of external throttling. So, IMO, any difference between governors here indicates at least one of the governors is wrong. But tbench sounds pathologically difficult for cpufreq governors -- a TCP connection might normally be an external throttle, but if the other end is localhost you have a serial multithreaded load where the serialization is substantially hidden from the scheduler.

Honestly it feels like the team AMD has on cpufreq are throwing things at the wall to see what sticks. They need to go to lunch with the people who designed the algorithms in the PMU that the EPP register is affecting, and work out a sound theoretical foundation for what CPU frequency scaling is supposed to do and what their figure of merit is.

And the proper solution probably requires teaching schedutil about serial multithreaded load, somehow.

**Linuxxx** · 14 September 2022, 02:44 PM

Originally posted by yump View Post

Look at the 2nd table from the mailing list post. It's screenshotted in the article.

It says the highest performance is delivered by amd-pstate+ondemand, beating the performance governor by 13.5%. No idea what could cause that.

But I don't trust any of this. The units in the table are wrong -- there should be no time units in performance / watt, which cancels to computations per joule. The goal they are optimizing for is unclear -- you can always increase perf/W by reducing frequency, but for a batch job that runs to completion, reducing frequency will reduce performance too. The mission of a cpufreq governor is to identify workloads that do not run to completion but instead are throttled by external causes like timers, network throughput, or display refreshes, and reduce the frequency as much as possible without causing those workloads to miss their latency targets. Spend more time computing, and less time waiting.

I found this patch series that adds runners for "tbench" and "gitsource". It looks like "tbench" runs tbench with 128 client processs and the server on the same host, and "gitsource" runs "make test" on the git source code. The examples in the documentation have the units fixed, but these both seem like "batch job that runs to completion" type workloads. There is no source of external throttling. So, IMO, any difference between governors here indicates at least one of the governors is wrong. But tbench sounds pathologically difficult for cpufreq governors -- a TCP connection might normally be an external throttle, but if the other end is localhost you have a serial multithreaded load where the serialization is substantially hidden from the scheduler.

Honestly it feels like the team AMD has on cpufreq are throwing things at the wall to see what sticks. They need to go to lunch with the people who designed the algorithms in the PMU that the EPP register is affecting, and work out a sound theoretical foundation for what CPU frequency scaling is supposed to do and what their figure of merit is.

And the proper solution probably requires teaching schedutil about serial multithreaded load, somehow.

Thanks, these are really great insights!

But for schedutil to advance, Linux kernel hackers would need to actually spent time working on that cursed governor!

At least from the outside, looks like it's anything but alive...

**arQon** · 01 October 2022, 07:00 PM

Originally posted by Linuxxx View Post

I wonder what this says about the quality of schedutil...

I think "quality" isn't really the right word here, since it's possible to have "quality" software that still has poor outcomes, but that aside the *viability* of schedutil just isn't really in question any more, after several years of continually underperforming in every possible scenario.

Benchmarks on ?AT I think? show the 7000 series at *65W* generally running at close to 100% of "full" performance in ST loads, and 80%+ in MT ones; and at 105W those margins shrink even more - but it's a BIOS-level change, so it needs a reboot. That's what's most interesting to me about this: that we might, potentially, someday, get a governor that can be told to idle *hard* most of the time, but then be told to favor performance via a trivial sysctl when doing "real" work or playing a game etc.
Essentially, ondemand, except with *explicit* control rather than driver guesswork. If it happens to transition better than ondemand, great, but if not then I still wouldn't really care much, because I *know* when I need peak performance and when I don't, and those transitions are for extended periods of time, not just a fraction of a second to parse a web page etc, so even 500ms each time would still be perfectly acceptable.

**yump** · 02 October 2022, 01:00 AM

Originally posted by arQon View Post

Benchmarks on ?AT I think? show the 7000 series at *65W* generally running at close to 100% of "full" performance in ST loads, and 80%+ in MT ones; and at 105W those margins shrink even more - but it's a BIOS-level change, so it needs a reboot.

The reason that it runs at 100% performance in ST loads is that it runs 100% of frequency in ST loads. It's a total package power limit, and even Zen4 with it's 230 W, 5.7 GHz insanity is unable to dump 65 W into a single core. This throttling mechanism is mostly useless for saving energy -- it does nothing unless your CPU is under quite heavy load.

And it doesn't have to be a BIOS-level change. I'm pretty sure you can tweak it under Windows with AMD's whiz-bang GUI, and the equivalent Intel feature is exposed in /sys/class/powercap/. AMD just needs to hook theirs up.

That's what's most interesting to me about this: that we might, potentially, someday, get a governor that can be told to idle *hard* most of the time, but then be told to favor performance via a trivial sysctl when doing "real" work or playing a game etc. Essentially, ondemand, except with *explicit* control rather than driver guesswork. If it happens to transition better than ondemand, great, but if not then I still wouldn't really care much, because I *know* when I need peak performance and when I don't, and those transitions are for extended periods of time, not just a fraction of a second to parse a web page etc, so even 500ms each time would still be perfectly acceptable.

Have you actually tried browsing the web with your CPU locked to minimum frequency? (cpupower frequency-set -g performance -u 800Mhz). It kinda sucks. Sure, you get almost all of the energy savings at 1600 MHz and performance is a lot better, but that's not always enough for video playback. You could tell it you need peak performance, or the player could with some API, but the complexity of video varies a lot, so you do actually want some kind of adaptive frequency control.

Announcement

AMD Posts "P-State EPP" Driver As New Attempt To Improve Performance-Per-Watt On Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment