Announcement

Collapse
No announcement yet.

Some AMD GPUs Affected By A Nasty Power Regression That Snuck Into Linux 4.18 Stable

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by ptyerman View Post
    Currently I'm in Windows and GPU-Z is reporting a stable 28.4W at idle. This is a Sapphire RX580 8Gb model card.
    Interesting,
    I own a RX580 8GB, but yet in the box.
    My Sapphire NITRO+ 4GB, in Power Save mode gives me 32 Watts, around that.
    Maybe related with pcie versions, or atomics and of course kernel versions..!?

    Comment


    • #62
      Originally posted by nuetzel View Post

      OK, good, but I do not see xx W in your tool shot ('only' 300/300 Mhz) and...

      the regression still stays in 'stable'.
      With kernel 4.18.10 my MSI RX570 was always 1281 MHz / 1750 MHz, even on idle... now with 4.19 RC6 is back to the normal idle speeds. I noticed that when i saw higher temps of my card on idle. I have the Obiaf Mesa 18.3 PPA, but i don't know why the Radeon Profile tool is showing mes 18.1.0

      Comment


      • #63
        Could be a lot of factors at play. Still running Windows at the moment and just loaded GPU-Z again, the GPU only power draw still lists as 28.4W and the VDDCI power draw still lists as 16W, this is while idle. This is on Windows 10 x64 LTSB,will compare it with Linux readings later.

        Edit:
        Forgot to mention, this mainboard has PCI-E v2, I don't have a newer board to test with.
        Last edited by ptyerman; 10-05-2018, 09:38 AM. Reason: Forgot to add info

        Comment


        • #64
          Originally posted by Strunkenbold View Post

          I think their QA is automated tests. If there is no test, to test idle power consumption, it doesnt get caught.
          Further I think there is no one at AMD doing extensive testing apart from these automated tests. They are just too busy.
          Given that AMD has to implement many features in the Linux Kernel to reach to the same feature level like the Windows drivers, I guess minimal testing and thus resulting in many regressions, are expected.
          I really would hope AMD does a little more for driver quality. There are a plethora of crash reports on the bug tracker but usually no reaction. Maybe they should think about contracting a consulting company to address those bugs one by one.
          Automated test is very fine, way better than manual, but they should have way more tests to catch stuff.
          I imagine that seeing stuff on screen breaking is hard, but number comparisons such as this shouldn't be too bad, if they have the appropriate measuring hardware of course.

          As for bugs, I feel you, it took once a year and a half for one of mine to be fixed, while it was completely locking the computer :/ The worse part is one dev actually asked me to test some stuff but never replied when I posted the results. I suppose they really lack manpower, so it's of course not our beloved devs fault.

          Comment


          • #65
            Originally posted by dwagner View Post

            Did you not mention or not enter the "echo manual >power_dpm_force_performance_level"? Without it, your writes to pp_dpm_* are ignored.
            Of course I did and verified, but (another bug)...

            Thanks!

            Now, holidays!

            Comment


            • #66
              Originally posted by torsionbar28 View Post

              The -273 does look weird, but I don't think it's related. Here's my RX 560 on 4.18.9, idling at a cool 6 W. I have not tried 4.18.10 yet...

              Code:
              [[email protected] ~]$ uname -a; sensors | grep -A 5 amdgpu
              Linux nostromo.localdomain 4.18.9-200.fc28.x86_64 #1 SMP Thu Sep 20 02:43:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
              amdgpu-pci-0200
              Adapter: PCI adapter
              vddgfx: +0.72 V
              fan1: 1244 RPM
              temp1: +31.0°C (crit = +94.0°C, hyst = -273.1°C)
              power1: 6.12 W (cap = 70.00 W)
              So I've just updated from 4.18.9 to 4.18.11 and it appears I now have the power regression. This is after idling for quite a while. MSI brand Rx 560.

              Code:
              [[email protected] ~]$ uname -a; sensors | grep -A 5 amdgpu
              Linux nostromo.localdomain 4.18.11-200.fc28.x86_64 #1 SMP Sun Sep 30 15:31:40 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
              amdgpu-pci-0200
              Adapter: PCI adapter
              vddgfx:       +1.12 V  
              fan1:        1255 RPM
              temp1:        +42.0°C  (crit = +94.0°C, hyst = -273.1°C)
              power1:       18.25 W  (cap =  70.00 W)
              Last edited by torsionbar28; 10-06-2018, 07:03 PM.

              Comment


              • #67
                So this is why my RX 480 is stuck on 1342/2000 with full voltage all the time. Even on idle, it never reverts to 300/300 like it used to. Power usage is at 35W just sitting on the desktop. Come back Linus!

                Comment


                • #68

                  This patch shouldn't have been applied to 4.18. It looks like it was autoselected for 4.18: https://lkml.org/lkml/2018/9/15/172 It should be reverted. I'm working on getting it reverted.

                  Comment


                  • #69
                    Running Ubuntu 18.10 on development branch and today after receving an update for linux kernel 4.18.0.9.10 (applied patches from mainline 4.18.12 kernel accroding to the change log in launchpad), my idle temperature (from 30C to 49C) and power consumption (from 6w to 18w) increased drastically with RX460...

                    Comment


                    • #70
                      I'm using Arch Linux with all the latest updates.

                      Is this why within the last week or so with my AMD RX 580 I've lost the 144Hz option for my monitor with 120Hz being the max option available on my DisplayPort connection? Doesn't matter if I use GNOME or KDE Plasma, the 144Hz option disappeared. And it did work a couple weeks ago.

                      Now, if I boot my USB drive with a live distro image from a month ago (Manjaro and Antergos being used for testing purposes for the live environment), the 144Hz option is available. Then if I boot my USB drive with a live distro image released within a week (again, Manjaro 18.0 RC1 and Antergos 18.10 used for testing purposes for the live environment), the 144Hz option is gone with only 120Hz being the max.

                      Is it a known bug or is it part of this bug?
                      Last edited by Awesome Donkey; 10-10-2018, 01:44 PM.

                      Comment

                      Working...
                      X