Announcement

Collapse
No announcement yet.

The Linux 4.18 Power Regression Affecting Some AMD Graphics Cards Should Be Reverted

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Linux 4.18 Power Regression Affecting Some AMD Graphics Cards Should Be Reverted

    Phoronix: The Linux 4.18 Power Regression Affecting Some AMD Graphics Cards Should Be Reverted

    Making the rounds last week was a nasty power regression hitting Linux 4.18 stable and as we ended up bisecting was caused by a change to the AMDGPU kernel driver and affected select Radeon graphics cards. It looks like the goal this week is to get that patch reverted from Linux 4.18...

    http://www.phoronix.com/scan.php?pag...-Revert-Likely

  • #2
    Looking forward to it. My MSI Rx 560 on F28 showed 6 watts idle on kernel 4.18.9, and 20 watts idle after 'upgrading' to 4.18.11.

    Comment


    • #3
      Here too, my Sapphire Nitro+ rx480 is getting hot and eating power for no reason:
      amdgpu-pci-0100
      Adapter: PCI adapter
      vddgfx: +1.14 V
      fan1: 685 RPM
      temp1: +50.0°C (crit = +94.0°C, hyst = -273.1°C)
      power1: 37.03 W (cap = 150.00 W)

      it was well below 10W and 36C before

      Comment


      • #4
        My original comment in the bug report was in regards to the patch itself which was part of 4.19. On first reviewing the bug I couldn't remember the exact release the patch went into and it wasn't clear to me at first if the bug report was a regression or a side affect of the patch. For example, there was a bug in vega which prevented the GPU from going to the lowest mclk level in some cases. We fixed that and the idle mclk was lower, but the result was that some games with relatively light GPU workloads regressed slightly in performance in certain cases since the mclk was starting from a lower level at idle. Upon further investigation, it became clear that this was an actual regression and not just a side effect of fixing another bug.

        The patch was not intended for 4.18 and I did not flag it for stable. Sasha's patch auto select tool picked it up for 4.18, but it shouldn't have been applied. I've asked that it be reverted from 4.18.
        Last edited by agd5f; 10-08-2018, 03:17 PM. Reason: typo

        Comment


        • #5
          Originally posted by Grinness View Post
          Here too, my Sapphire Nitro+ rx480 is getting hot and eating power for no reason:
          amdgpu-pci-0100
          Adapter: PCI adapter
          vddgfx: +1.14 V
          fan1: 685 RPM
          temp1: +50.0°C (crit = +94.0°C, hyst = -273.1°C)
          power1: 37.03 W (cap = 150.00 W)

          it was well below 10W and 36C before
          One of mine 580 cards in low powermode goes to 32W, in compute mode without doing nothing consumes around 50 watts..
          Is good to know that a gfx8 cards iddle to less than 10 watts...that could be acceptable.

          A Nv 1060 6GB iddle around 12-13 watts, in headless mode after 'nvidia-smi -pm 1' it iddles at ~6 watts.

          Comment


          • #6
            Originally posted by tuxd3v View Post

            One of mine 580 cards in low powermode goes to 32W, in compute mode without doing nothing consumes around 50 watts..
            Is good to know that a gfx8 cards iddle to less than 10 watts...that could be acceptable.

            A Nv 1060 6GB iddle around 12-13 watts, in headless mode after 'nvidia-smi -pm 1' it iddles at ~6 watts.
            Downgrading to kernel 4.18.9:
            Code:
            name -a
            Linux moby 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 UTC 2018 x86_64 GNU/Linux
            power drawn is reasonable:
            Code:
            watch -n 3 sensors
            
            amdgpu-pci-0100
            Adapter: PCI adapter
            vddgfx: +0.75 V
            fan1: 687 RPM
            temp1: +35.0°C (crit = +94.0°C, hyst = -273.1°C)
            power1: 7.16 W (cap = 150.00 W)
            Note that power1 and vddgfx fluctuate a lot depending a number of factors (mouse input, other programs open -- even if only waiting for input, e.g. gvim)

            @AMD PPL, I noticed a large discrepancy in fan behaviour between linux and Windows (10 with latest AMD adrenaline drivers): under Windows fans stop when the card is less then 50C (as reported by Global Wattman). In linux the fans never stop, even if the temperature is low (35C -- my CPU is water-cooled and the only source of heat within the case is the GPU)

            Any insight?
            Thanks

            Comment


            • #7
              Originally posted by Grinness View Post

              Any insight?
              Thanks
              You can try to change the fan speed at low temperatures,
              You need to check the AMDGPU documentation:
              https://dri.freedesktop.org/docs/drm...and-monitoring

              But I don't know if there are several profiles for the fan..
              I know that there are 6 powerprofiles for diverse tasks, compute, vr,3d desktop, lowpower mode, ... but don't know if there is anny for the fan.

              I also Own 2 Sapphire Nitro+ 580 ( maybe I will resell, due to the fact that they need Pcie atomics operations for Opencl..), and this boards are dual Bios, it could be that I am in the most agressive one...I never changed the swith they have..

              And I cant find any information, on witch side is the most performance agressive one

              Comment


              • #8
                Originally posted by agd5f View Post

                The patch was not intended for 4.18 and I did not flag it for stable. Sasha's patch auto select tool picked it up for 4.18, but it shouldn't have been applied. I've asked that it be reverted from 4.18.
                Ha, ha Sasha Levin <[email protected]microsoft.com>

                Since gremlins joined mafia to establish harassment-free development, then if someone have burned cards from this should complain to Microsoft directly

                Michael if he wants more clicks should change title to - Microsoft Breaks AMD Cards on Linux in Harassment-Free Way Now, Linux community started filing a lawsuits to companies directly
                Last edited by dungeon; 10-08-2018, 07:27 PM.

                Comment


                • #9
                  Originally posted by tuxd3v View Post

                  You can try to change the fan speed at low temperatures,
                  You need to check the AMDGPU documentation:
                  https://dri.freedesktop.org/docs/drm...and-monitoring

                  But I don't know if there are several profiles for the fan..
                  I know that there are 6 powerprofiles for diverse tasks, compute, vr,3d desktop, lowpower mode, ... but don't know if there is anny for the fan.

                  I also Own 2 Sapphire Nitro+ 580 ( maybe I will resell, due to the fact that they need Pcie atomics operations for Opencl..), and this boards are dual Bios, it could be that I am in the most agressive one...I never changed the swith they have..

                  And I cant find any information, on witch side is the most performance agressive one
                  TBH I would prefer not to mess with the fans' speed, but rather understand if there is a specific reason why linux & windows have different profiles -- in both OS I use the default settings, so the only difference is the driver

                  The Nitro+ should come with the switch already at the 'aggressive' settings.
                  You can check the available frequencies, in my case:
                  Code:
                  cd /sys/class/drm/card0/device
                  cat pp_dpm_sclk
                  0: 300Mhz
                  1: 608Mhz *
                  2: 930Mhz
                  3: 1097Mhz
                  4: 1165Mhz
                  5: 1211Mhz
                  6: 1256Mhz
                  7: 1342Mhz

                  Comment


                  • #10
                    Originally posted by Grinness View Post
                    (...)
                    The Nitro+ should come with the switch already at the 'aggressive' settings.
                    You can check the available frequencies, in my case:
                    Code:
                    cd /sys/class/drm/card0/device
                    cat pp_dpm_sclk
                    0: 300Mhz
                    1: 608Mhz *
                    2: 930Mhz
                    3: 1097Mhz
                    4: 1165Mhz
                    5: 1211Mhz
                    6: 1256Mhz
                    7: 1342Mhz
                    That could explain a lot..
                    Thanks for that valuable info,

                    I am on a ARM64 machine now, but
                    I will check power draw with the switch toggled on the other side

                    Comment

                    Working...
                    X