No announcement yet.

Linux 5.17 Released With AMD P-State Driver, Plenty Of New Hardware Support

This topic has been answered.
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by farnz View Post

    Looking at my wall consumption, and using a software decoder, at 3.2 GHz (the clock speed of my system as set by the performance governor), I consume 18.5W. At 3.1 GHz, this drops to 18.4W. At 3.0 GHz, it goes back up to 18.5W. As I drop the frequency further, the wall consumption goes up to a peak of 20W at 1.8 GHz before it starts dropping frames.
    Guessing from the wall power and max frequency, that's some kind of laptop or mini-PC, but you have direct control of the fan voltage? I'm curious what kind of hardware you have there. I hacked up a benchmark for this (takes about an hour to run), and on my machine (i5-4670K with a slight overclock/undervolt; factory max speed 3.8 GHz), power efficiency is flat-ish only up to 3 GHz or so. But somehow schedutil still beats performance even with the frequency set to below the point where it goes pear-shaped, possibly because schedutil is allowed to go all the way down to 800 MHz:

    Sorry about how janky it is. I did run on a quiet system, but... If I find the time, I'll fix it to average the best 2 out of 5 or something.

    The breakpoint looks to be around 3.4 GHz, which is the same frequency where my CPU's VID/Frequency map gets way steeper, and also the start of the "turbo" range:


    turbostat shows the balance here - at 3.0 GHz my core power consumption is lowest, but the package power consumption goes up because turbostat sees RAMWatt consuming more. At 3.1 GHz, I hit the balance point where while the cores consume a bit more, RAMWatt is lower, and PkgWatt goes down.

    Reducing frequency further continues the balancing - CorWatt goes down, but RAMWatt goes up by more than CorWatt falls by, causing PkgWatt to increase.
    RAMWatt is not reported by my CPU. My wall power meter doesn't have any kind of API and only updates once per second, so there's no hope of synchronizing measurement with the video. But eyeballing, it looks like 84 W at 2 GHz, and 93W at 4 GHz, both with the performance governor.

    If I limit fan voltage to 7V (== less cooling), then the balance point falls to about 2.8 GHz once the chip has reached its new stable temperature, where I have a minimum wall power consumption. Both CorWatt and RAMWatt are higher at 3.1 GHz than at 2.8 GHz with the reduced cooling. And it's still using less power racing to idle at 2.8 GHz than it does at 1.8 GHz where I can't drop lower without dropping frames.

    To make 1.8 GHz do better than race-to-idle, I have to limit my fans to 3.3V. At this point, the reduced cooling is such that the balance point is 1.8 GHz. I can still run at 3.2 GHz, though, it's just that my chip gets a lot hotter than it does when my fans are allowed their full 12V.
    As for temperature effects, with 100% stress -c 4 load on all cores at 4.0 GHz, I get 81 W PkgWatt, 69 W CorWatt at 92°C (minimum fan speed), and 68 W PkgWatt, 57 W CorWatt at 63°C (maximum fan speed). Video playback is not a thermally significant load for my CPU cooler, and in that test, package temperatures ranged from 49-55°C, at minimum fan speed.

    With how low your computer's wall power is, it's possible that the fan motors themselves are significant.

    Just going by physics, my guess is that temperature just makes the efficiency fall-off at high frequency even more nonlinear, because of the vicious cycle of higher temperature -> higher leakage power -> higher temperature, and also needing more voltage for the same frequency at higher temperature, if your CPU's DVFS is fancy enough to do that. I think mine is too old.

    So, based on doing the test you recommend, and extending to cover different cooling points, I find that race-to-idle is optimal assuming I choose the frequency that is power-optimal for the current thermals. With good cooling, performance is only slightly off optimal; as cooling gets worse, the power-optimal point for my CPU falls, until the power optimal point is lower than the frequency point one above the minimum I can set.

    And this is where the P-state driver gets potentially interesting - the chip can know its own thermal situation and current power-optimal point, and thus race-to-idle whenever the current power-optimal operating point is higher than the minimum speed to reach the goal.
    I think our main point of disagreement is that I don't think the highest frequency the chip can run at is usually pretty close to the optimal power consumption frequency. Rather, I think it's well above the optimum.