Announcement

Collapse
No announcement yet.

Linux 6.3 To Allow Some AMD GPU Power Savings Benefits Even Without S0ix BIOS Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 6.3 To Allow Some AMD GPU Power Savings Benefits Even Without S0ix BIOS Support

    Phoronix: Linux 6.3 To Allow Some AMD GPU Power Savings Benefits Even Without S0ix BIOS Support

    AMD sent in another batch of AMDGPU features and fixes to DRM-Next this week ahead of the Linux 6.3 merge window. With being late in the cycle the material is mostly of different fixes -- including some Radeon RX 7000 series "RDNA3" (GFX11) fixes -- but also a new feature in that AMD Radeon GPU power savings with S0ix even when the system BIOS support is lacking...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    The numbers over suspend showed:
    * No patch: 9.2W
    * Skip amdgpu suspend entirely: 10.5W
    * Run amdgpu s0ix routine: 7.7W​
    While I think it's cool that they're chasing down those last few watts, those of us (both on Linux and elsewhere) with multi-monitor setups w/ the 7900XTX are idling at ~80W just on the desktop.

    2023-02-05-104207_grim.png

    While the card's been fine aside from that on Linux, it's still a nightmare to use in VR on Windows, to the point that I would actually still call it completely unusable 2+ months in to the release.

    Graph of frame times on windows and linux, showing 300%+ spikes on windows in VR

    You cannot control your fans on Linux the new cards either, as the SCPM feature (signed firmware for the power-managing SMU controller) causes it to disable the sysfs functionality that we used to use to control the fans! In theory, one should be able to implement the OverDrive interface, and gain access to the card-controlled fan curve settings that other platforms use (I'm working on this now), but doing so is slow going without first-party documentation.

    Overall, I'm quite sad to see a manditory-on integrated psp being used to load power management firmware of all things (trust me AMD, with it idling at 80W just displaying desktops at normal resolutions/refresh rates, nobody is out to steal your super secret power stuff). I'll be a loyal customer so long as they're supporting the open desktop, but jeez Louise, the problems are plentiful this time around (I've been intentionally grabbing hardware early on to have fun solving these kinds of things). Hopefully some people that have access to first-party info for the cards can get around to the most glaring bits soon, because it's definitely soured my ability to talk them up to friends.
    Last edited by mcoffin; 05 February 2023, 01:50 PM.

    Comment


    • #3
      Originally posted by mcoffin View Post

      While I think it's cool that they're chasing down those last few watts, those of us (both on Linux and elsewhere) with multi-monitor setups w/ the 7900XTX are idling at ~80W just on the desktop.
      Nvidia has the same problem, you don't even need multiple monitors, just one with refresh rate above 85Hz and power consumption is around 65W while doing absolutely nothing.

      Comment


      • #4
        And intel ARC basically the same.
        I mean what the heck are these card engineers thinking.
        Just the video card alone idle power at these ~40-80W levels (depending on model etc.) is a pretty good chunk of the WHOLE rest of the power consumption of a pretty large number of modern desktop PC systems when "idle-ish" e.g. sitting at the boot menu / BIOS screen, sitting at the desktop doing basically nothing (e.g. LINUX CLI or GUI), in some low end "server" basically idle, etc.

        And to add insult to injury if the fans aren't OFF when the system is the next best thing to idle at the desktop or otherwise when the GPU has darn near nothing to do then that's consuming precious life time of a mechanical moving entity that is (1) critical to the card to function in its non-idle use cases, and (2) is almost guaranteed to be either expensive, absurdly difficult, or practically impossible to repair / replace so basically is likely a reason the card as a whole will "fail" within a small few years just due to USELESSLY spinning when the card is under ~no or very light load.

        If they're going to have built in firmware / controllers on these GPUs then should keep the fans OFF by default (even with no OS / driver action) when the card is well under its maximum operating temperature and no OS / driver setting is specifically requesting otherwise, and by default should power manage the card to be basically in standby < 5W mode by default when literally idle and otherwise at the minimum useful power level needed to display a desktop / console CLI on the actually utilized port(s) before any driver / OS requests some higher performance mode.

        I'm sick of noisy power sucking / wasting GPUs that basically DIE not because they're used hard for say mining but simply because of "broken by default" power management settings that totally screw LINUX or non-driver loaded / uninitialized idle use cases using lots of power / fan for ZERO reason by default.

        One such idle use case could be say having a dedicated GPU used for pass-through that just sits idle driver-less on a host until someone actually starts a VM that needs the GPU to do something. Since there isn't even any working SR-IOV and default firmware power management in place to ameliorate that ridiculous architectural / product support failing one gets to have not one but TWO GPUs sucking power and spinning fans uselessly at a LINUX desktop just so one can occasionally run a half
        decent graphics / render app or windows VM or something. UGH.

        Comment


        • #5
          Originally posted by mcoffin View Post

          While I think it's cool that they're chasing down those last few watts, those of us (both on Linux and elsewhere) with multi-monitor setups w/ the 7900XTX are idling at ~80W just on the desktop.
          That's pretty crazy. What's the story with that? Is it expected behavior or a bug? And if expected, why does it need so much power when idle?

          Comment


          • #6
            Originally posted by mcoffin View Post

            While I think it's cool that they're chasing down those last few watts, those of us (both on Linux and elsewhere) with multi-monitor setups w/ the 7900XTX are idling at ~80W just on the desktop.
            Also, is there a bug to track high power usage?

            UPDATE: I think I found it: https://gitlab.freedesktop.org/drm/amd/-/issues/2315

            Comment


            • #7
              Originally posted by mcoffin View Post
              Overall, I'm quite sad to see a manditory-on integrated psp being used to load power management firmware of all things
              PSP has been loading firmwares for all engines on the GPU since it was introduced, including power management firmware. SCPM is not designed to hide anything (you can still see the pptables); it is designed to provide an extra level of protection around users accidentally doing stupid things with the data in the pptables and damaging their GPUs.

              Comment


              • #8
                Originally posted by agd5f View Post
                SCPM is not designed to hide anything (you can still see the pptables); it is designed to provide an extra level of protection around users accidentally doing stupid things with the data in the pptables and damaging their GPUs.
                First off, I'd like to thank you for what you do for Linux and the open desktop; I have nothing but respect and admiration for you!

                My issues with the PSP/SCPM in general are more of the overly-idealistic user freedom type, and while I'm personally disappointed to see it becoming more the norm, if I'm being totally honest I don't exactly think it's that big of an issue given that it allows for easier market segmentation by the manufacturer, and there's a very slim chance that I'd ever get to the level of tweaking capability to need or want to extend beyond what is possible right now.

                My issues right it right now stem from the simple fact that the APIs necessary to perform what I would consider standard power-user operations remain unimplemented when it comes to cards that run in this new state. Notably, this comes down to two APIs:

                1. OverDrive - This doesn't even seem to necessarily properly function on Windows, as it seems like often settings aren't "taking" properly, or at the very least not performing the change that they used to on previous generations. Normally, this would be fine, but both on Windows and Linux, I've used OverDrive to set minimum clock speeds to higher-than-normal values while running VR applications to reduce frame time variance, which is the only thing I've found to make VR applications playable.
                2. Fan control - While I'd argue that the removal of the sysfs-like fan control API does little to protect users from themselves, I do understand the desire (especially with the high power draw at idle bug(s)). BUT, since OverDrive remains unimplemented, that leaves people without *any* level of control over the fans. I'm not a fan of userspace-controlled fan API removal, but when combined with the lack of implementation of the new API, it's quite unfortunate. (and I'm working on implementing this! I've just hit a few places where I don't have the requisite information seemingly, so hopefully I'll be able to continue on that soon)

                At the end of the day, my positions on the overly-idealistic user freedom are quite loosely held, and I'm willing to admit I'll be happy with a device that just works for what I want/need it to do. I'm working on an OverDrive implementation for the 7900XTX, but running in to knowledge gap problems, and unexpected behavior, so the journey is going a little more roughly than when I did the same for navi10. When complete, though, hopefully these patches will actually ease the overhead of adding overdrive support to new generations of cards, and reduce the amount of ASIC-specific boilerplate needed for it (hopefully making it more likely for this kind of support to exist at release in the future).

                So, to anyone that's reading: I'm not criticizing the engineers working on the AMD drivers here (and in fact, given its cross-platform ever-presense, the power issues might not even be software-related). Given the rollout of this feature though, we have work to do to support things that still SHOULD be supported in some capacity, so that users that need them for given workloads can achieve what's necessary. I'm trying to take on some of that work, and hopefully with a little information that I seem to be missing right now, we'll be able to at least resolve the fan control issues, and hopefully regain control over the systems that have made VR applications playable in the past.

                Comment


                • #9
                  Originally posted by mcoffin View Post
                  1. OverDrive - This doesn't even seem to necessarily properly function on Windows, as it seems like often settings aren't "taking" properly, or at the very least not performing the change that they used to on previous generations. Normally, this would be fine, but both on Windows and Linux, I've used OverDrive to set minimum clock speeds to higher-than-normal values while running VR applications to reduce frame time variance, which is the only thing I've found to make VR applications playable.
                  are these less demanding VR titles? are they unreal engine? i've seen some low load games start stuttering, especially on the main menu (regular non-vr games, 60 vsync, rx580)

                  on windows i dont bother messing with overdrive/radeon settings which seem to get ignored, the trick is that any opencl load actually spikes the clockspeeds to one of the higher power states, thus ClockBlocker was born http://www.comroestudios.com/ClockBl...Doc/index.html (still http unfortunately)

                  from idle 300mhz, low load game bouncing below 1000, to clockblocker minimum 1300 with 1411 max if the game demands it, i'm curious if newer gens still react this way

                  i havent tried games on linux, but i did do the opposite of forcing the lowest power state according to https://wiki.archlinux.org/title/AMDGPU#Manual_(default)

                  Code:
                  echo "manual" >| /sys/class/drm/card0/device/power_dpm_force_performance_level
                  echo "0" >| /sys/class/drm/card0/device/pp_dpm_sclk
                  Code:
                  watch -n 0.5 cat /sys/class/drm/card0/device/pp_dpm_sclk
                  again curious if new gens are the same, the clocks use different voltages as a side effect so that's why it was perfect to keep my linux browsing in check (idling on a beatport page caused max clocks, fan going on-off all the time, annoying since windows doesnt do that)

                  reminds me of the recent workaround where a compositor (mutter?) increased its demand to cause the (intel?) gpu to clock higher so that rendering can complete within vsync, or whatever that ridiculous attempt was

                  i think we shouldnt make gpu governors like cpu governors, we could have a user facing option to prioritize a target framerate, at the very least let the user decide if they want to prolong high power states

                  i'm imagining a basic algorithm something like: if fps < target then increase power state regardless of load / if load has reduced then stay on high power state for another Ntarget seconds / jump to max state first when load begins rather than stepping up states across frames based on load per state (not sure if this last one is relevant, dont think anything is currently this simple other than cpu-conservative)

                  Comment

                  Working...
                  X