Announcement

Collapse
No announcement yet.

Intel P-State Driver Preparing To Migrate From "Powersave" To Passive Schedutil Default

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • microcode
    replied
    Originally posted by Linuxxx View Post

    Thanks for checking it out!

    That's really unfortunate, but it seems as though this is yet another shortcoming of AMD's Linux support, since my Intel E8400 (yes, Wolfdale), which also is only supported by the acpi-cpufreq driver, reports the maximum transition latency as 160 us.

    Also, if AMD really achieved a maximum transition latency of just 1 us with their Zen 2 architecture, shouldn't this put them well ahead of Intel in the latency department, too?

    Anyway, I really hope (and yes, here I go again) Michael will report the maximum transition latency of whichever Intel CPU He is going to run the upcoming schedutil benchmarks on, so that we all can get an idea of just how excessive the chosen value for rate_limit_us really is!
    I think it's not as big a deal as you might think. I just disable the lowest power state on my workstation. The difference between the second and peak power states is "ludicrous" vs. "unbelievable", as far as my experience with CPU performance is concerned.

    With Ryzen mobile coming around, it'll get more important, but that'll work itself out.
    Last edited by microcode; 04-15-2020, 05:30 PM.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by microcode View Post

    Seems that value, maximum transition latency, is not exposed by acpi-cpufreq, at least not on any of my machines. I've seen it quoted as 1, whereas the Zen 1 had a power state transition latency of something like 30.
    Thanks for checking it out!

    That's really unfortunate, but it seems as though this is yet another shortcoming of AMD's Linux support, since my Intel E8400 (yes, Wolfdale), which also is only supported by the acpi-cpufreq driver, reports the maximum transition latency as 160 us.

    Also, if AMD really achieved a maximum transition latency of just 1 us with their Zen 2 architecture, shouldn't this put them well ahead of Intel in the latency department, too?

    Anyway, I really hope (and yes, here I go again) Michael will report the maximum transition latency of whichever Intel CPU He is going to run the upcoming schedutil benchmarks on, so that we all can get an idea of just how excessive the chosen value for rate_limit_us really is!

    Leave a comment:


  • microcode
    replied
    Originally posted by Linuxxx View Post

    First of all, I'm really sorry for the late reply, but I've seen Your post just now!

    Thanks for checking out the value of rate_limit_us on Arch Linux, but Your assumption of 1000x the scaling driver's transition latency seems to be not correct, at least in my case on elementaryOS (with Ubuntu 18.04 as its basis):

    The output of
    Code:
    sudo cpupower frequency-info
    tells me the following on my Intel IvyBridge processor with the intel_cpufreq driver:
    Code:
    maximum transition latency: 20.0 us
    So in my case, it is only 25x with the default value of 500 that Intel is setting for rate_limit_us (which seems rather arbitary, if You ask me).

    Would it be possible for You to also provide this value for Your ZEN 2 processors?

    And I already know this must be getting on his nerves, but I just can't pass this opportunity to once again ask Michael if he could also include a seperate run when benchmarking schedutil with the Android value of 0 for rate_limit_us, just to see what difference it could make and whether the Intel devs should reconsider their decision for this value (which, by the way, is the only tunable for the schedutil governor).

    Now there might be some concern that this would lead to unneccessary overhead, but at least in my case, I haven't noticed any!

    For example, here's the output of
    Code:
    sudo cpupower monitor
    while my system is under light load (browser with multiple tabs open, music playing, photo slideshow transitioning):
    Code:
    Mperf || Idle_Stats
    C0 | Cx | Freq || POLL | C1 | C1E | C3 | C6
    4,83| 95,17 | 1870 || 0,00 | 0,02 | 0,49 | 1,09 | 93,46
    8,63| 91,37 | 2266 || 0,00 | 0,00 | 0,56 | 1,37 | 89,04
    4,92| 95,08 | 1896 || 0,00 | 0,15 | 1,62 | 2,52 | 90,69
    8,24| 91,76 | 1864 || 0,00 | 0,00 | 0,86 | 1,00 | 89,73
    As anyone can clearly see, my Intel Core i5-3350P (4 Cores & 4 Threads) still enters its deepest sleep-state (C6) the majority of time while also ramping up the frequencies only moderately, when bearing in mind that these are the hardware clock limits as reported by >>cpupower<<:
    Code:
    hardware limits: 1.60 GHz - 3.30 GHz
    Hopefully Micheal's benchmark results will help move the discussion for what the proper value of rate_limit_us ideally should be forward!
    Seems that value, maximum transition latency, is not exposed by acpi-cpufreq, at least not on any of my machines. I've seen it quoted as 1, whereas the Zen 1 had a power state transition latency of something like 30.
    Last edited by microcode; 04-12-2020, 06:30 PM.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by microcode View Post

    I just shelled into my remote Ryzen 3950X machine, and on Arch Linux it seems rate_limit_us is set to 1000, not 10,000; on my Threadripper 2950X it is also 1000.

    I think the default is 1000x the scaling driver's transition latency, so for that Zen 2 chip I guess that latency is 1μs, on whatever AMD chip were looking at it on before, it is about 10μs, and on the intel chip you were looking at, maybe it is 0.5μs.
    First of all, I'm really sorry for the late reply, but I've seen Your post just now!

    Thanks for checking out the value of rate_limit_us on Arch Linux, but Your assumption of 1000x the scaling driver's transition latency seems to be not correct, at least in my case on elementaryOS (with Ubuntu 18.04 as its basis):

    The output of
    Code:
    sudo cpupower frequency-info
    tells me the following on my Intel IvyBridge processor with the intel_cpufreq driver:
    Code:
    maximum transition latency: 20.0 us
    So in my case, it is only 25x with the default value of 500 that Intel is setting for rate_limit_us (which seems rather arbitary, if You ask me).

    Would it be possible for You to also provide this value for Your ZEN 2 processors?

    And I already know this must be getting on his nerves, but I just can't pass this opportunity to once again ask Michael if he could also include a seperate run when benchmarking schedutil with the Android value of 0 for rate_limit_us, just to see what difference it could make and whether the Intel devs should reconsider their decision for this value (which, by the way, is the only tunable for the schedutil governor).

    Now there might be some concern that this would lead to unneccessary overhead, but at least in my case, I haven't noticed any!

    For example, here's the output of
    Code:
    sudo cpupower monitor
    while my system is under light load (browser with multiple tabs open, music playing, photo slideshow transitioning):
    Code:
    Mperf                  || Idle_Stats                 
    C0  | Cx    |  Freq    ||  POLL  |  C1   |  C1E  |  C3   | C6
    4,83| 95,17 |  1870    ||  0,00  |  0,02 |  0,49 |  1,09 | 93,46
    8,63| 91,37 |  2266    ||  0,00  |  0,00 |  0,56 |  1,37 | 89,04
    4,92| 95,08 |  1896    ||  0,00  |  0,15 |  1,62 |  2,52 | 90,69
    8,24| 91,76 |  1864    ||  0,00  |  0,00 |  0,86 |  1,00 | 89,73
    As anyone can clearly see, my Intel Core i5-3350P (4 Cores & 4 Threads) still enters its deepest sleep-state (C6) the majority of time while also ramping up the frequencies only moderately, when bearing in mind that these are the hardware clock limits as reported by >>cpupower<<:
    Code:
    hardware limits: 1.60 GHz - 3.30 GHz
    Hopefully Micheal's benchmark results will help move the discussion for what the proper value of rate_limit_us ideally should be forward!

    Leave a comment:


  • microcode
    replied
    Originally posted by Linuxxx View Post
    Let me take this opportunity to quote myself here (https://www.phoronix.com/forums/foru...66#post1123766):


    Looks like I was already too pessimistic back then!

    Anyhow, what I would like to see is more experimentation by the INTEL developers around the value of:
    Code:
    /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us
    Currently, INTEL sets the number to 500, which provides huge latency benefits over the (insane) ACPI default of 10000! [AMD users are "enjoying" this default setting right now with Arch Linux & Manjaro!]

    However, what ANDROID does seems to be the most sensible approach -> simply setting this knob to 0!
    This way, your Linux-based smartphone is able to reach the lowest latency possible & be able to rival APPLE's so-highly-regarded iOS!

    I am already successfully using schedutil with this configuration on my own systems, so maybe, just maybe You & the INTEL developers might want to try it out, too!
    I just shelled into my remote Ryzen 3950X machine, and on Arch Linux it seems rate_limit_us is set to 1000, not 10,000; on my Threadripper 2950X it is also 1000.

    I think the default is 1000x the scaling driver's transition latency, so for that Zen 2 chip I guess that latency is 1μs, on whatever AMD chip were looking at it on before, it is about 10μs, and on the intel chip you were looking at, maybe it is 0.5μs.
    Last edited by microcode; 03-20-2020, 03:29 PM.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by bridgman View Post

    Sorry, don't think I saw your original question.

    Which driver are you talking about ? AFAIK we do all our CPU kernel work in upstream, so I'm not aware of an out-of-tree Zen CPU driver that we could "finally mainline".

    Just FYI I work on the GPU side, not CPU.
    First of all, I'm sorry for sounding harsh & unfair to You AMD guys; I really do appreciate all You do on the Linux side of things and wouldn't want to miss You!

    I was talking about these patches to the Linux kernel:
    From "Natarajan, Janakarajan" <>
    Subject [PATCHv3 0/6] CPPC optional registers AMD support
    Date Wed, 10 Jul 2019 18:37:09 +0000
    CPPC (Collaborative Processor Performance Control) offers optional registers which can be used to tune the system based on energy and/or performance requirements. Newer AMD processors (>= Family 17h) add support for a subset of these optional CPPC registers, based on ACPI v6.1
    https://lkml.org/lkml/2019/7/10/682

    And this reply to it from Peter Zijlstra:
    We're trying to move all cpufreq into the scheduler and have only a single governor, namely schedutil -- yes, we're still stuck with legacy, and we're still working on performance parity in some cases, but I really hope to get rid of all other cpufreq governors eventually.
    So AMD will have to go the route that Intel is already taking by focusing upon schedutil.
    Therefore, to achieve this goal, I was under the impression that AMD would need to develop a special Linux kernel CPU driver for Your own Zen CPU architecture, similar to Intel's >>intel_cpufreq<< driver used in conjunction with schedutil.
    But, for more than half a year now, progress on that front seems to have stalled...

    That's why I was wondering if You had any more insight into this matter that You could share or at least hint at.
    And even if You don't, would it be possible for You to ask around AMD HQ?

    Leave a comment:


  • bridgman
    replied
    Originally posted by Linuxxx View Post
    Speaking of AMD, it would be cool if they could mention an ETA on when they are planning to finally mainline their Zen CPU driver into Linux!
    I had already asked bridgman about it, but he doesn't seem to care (just like the rest of AMD).
    Sorry, don't think I saw your original question.

    Which driver are you talking about ? AFAIK we do all our CPU kernel work in upstream, so I'm not aware of an out-of-tree Zen CPU driver that we could "finally mainline".

    Just FYI I work on the GPU side, not CPU.

    Leave a comment:


  • PCJohn
    replied
    Originally posted by stormcrow View Post
    Not just Li-ion facts of life. Most batteries don't last more than 2-3 years, even long life automotive lead-acid batteries guaranteed for 5 years often start failing after 3 years. That's usually why they're prorated between 3-5 years instead of full replacement.
    My laptop battery is used since 2013 and still works perfectly: 75Wh moved to 59Wh. 79% is probably not so bad after 7 years for Li-ion battery (HP VH08XL). It largely depends how you treat the battery. The worst thing is probably to keep it discharged for the long time. My strategy is to always run not on battery if possible and keep it charged. Some people might argue that Li-ion batteries last longest with 50% charged, but I have very good experience without such treatment. I use similar approach for my tablet, charging it each evening and the battery is in perfect condition after 6 years. Guessing 80% (?) of original capacity, still able to power the tablet for ~7 days.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by cybertraveler View Post

    Is that latency setting one that governs how long it takes for the kernel to wake up after there is no more work to do? IE the kind of setting that benefits sound engineers and potentially some gamers.
    The way I understand it, this knob of schedutil controls the time that has to pass until the frequency of the CPU will be changed to react to a change in system load (e.g. Your game suddenly starts to compile some shaders).
    Naturally, this means the higher the number, the higher the latency times! (e.g. Your CPU is downclocked because You are in a menu inside a game, then You quit that menu and suddenly encounter a new type of enemy with a new type of shader that needs to be compiled, but schedutil doesn't ramp up the clocks just yet, since, You know, the necessary time between frequency changes as defined with >>rate_limit_us<< hasn't passed just yet.)
    Therefore, the lower the number, the lower the latency times!

    And since I already mentioned, Android with schedutil uses the value of 0 for this setting, which makes absolutely sense:
    They are competing with Apples' iOS, which for the majority of people seems like a super-optimized OS, because it can draw its UI animations so smooth.
    And then remember older versions of Android, where the UI was lagging because the CPU was not reacting fast enough to a user tapping the screen.

    Schedutil was developed because of exactly this reason - namely to provide the lowest latency possible, which simply isn't possible with any other governor in Linux.
    But to then cripple its functionality by setting an arbitary number that forces it to wait for an arbitary amount of time until it can change the CPU frequency simply makes no sense to me! (Like Arch Linux & Manjaro which do make use of schedutil by default on AMD systems, but leave the default value of 10000(!) untouched. At least Intel is wise enough to reduce the number considerably to 500.)

    Speaking of AMD, it would be cool if they could mention an ETA on when they are planning to finally mainline their Zen CPU driver into Linux!
    I had already asked bridgman about it, but he doesn't seem to care (just like the rest of AMD).

    Anyway, what is definitely clear though is that schedutil is the future of Linux, with all of the other governors getting ripped out eventually...

    ​​​​​​​Low-latency for the masses FTW!

    Leave a comment:


  • cybertraveler
    replied
    Originally posted by Linuxxx View Post
    Let me take this opportunity to quote myself here (https://www.phoronix.com/forums/foru...66#post1123766):


    Looks like I was already too pessimistic back then!

    Anyhow, what I would like to see is more experimentation by the INTEL developers around the value of:
    Code:
    /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us
    Currently, INTEL sets the number to 500, which provides huge latency benefits over the (insane) ACPI default of 10000! [AMD users are "enjoying" this default setting right now with Arch Linux & Manjaro!]

    However, what ANDROID does seems to be the most sensible approach -> simply setting this knob to 0!
    This way, your Linux-based smartphone is able to reach the lowest latency possible & be able to rival APPLE's so-highly-regarded iOS!

    I am already successfully using schedutil with this configuration on my own systems, so maybe, just maybe You & the INTEL developers might want to try it out, too!
    Is that latency setting one that governs how long it takes for the kernel to wake up after there is no more work to do? IE the kind of setting that benefits sound engineers and potentially some gamers.

    Leave a comment:

Working...
X