Announcement

Collapse
No announcement yet.

RX 560 crash under (light) load

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • RX 560 crash under (light) load

    Hi,

    I'm struggling with this for days now and it's so frustrating nothing helps to get it stable. I bought an ASRock Gaming Phantom RX 560 4G card last Thursday and was really excited to see the performance of amdgpu under Linux. But it almost immediately crashed on starting a video or something else which stresses the GPU a little bit.

    Well, faulty hardware was my first thought (why must something always go wrong!!?). But then I booted to Windows 10, it installed an old driver from 2017 automatically and all seemed fine. Installed Xonotic, which I normally only play under Linux, all perfectly stable! After that installed the latest Radeon drivers and things kept stable and already played hours on it without any issue (well it has crashed one or two times when loading a new map for some reason, but nothing noteworthy I think).

    This means that there must be something wrong with the Linux drivers. I've booted up different distro's (Ubuntu, Fedora and Manjaro live USB) which all have the same issue, crashing as soon as I open something like a Youtube video (Fedora even didn't display the logon manager). So it's not something specific to my current Manjaro installation, since they all have the same symptoms.

    The only way I can get it to run stable, but really slow, is by setting /sys/class/drm/card0/device/power_dpm_force_performance_level to low or manual with "0 1" set to 'pp_dpm_sclk' and 0 to 'pp_dpm_mlck'. Anything higher will eventually crash it (but takes considerably longer then when set to 'auto' or 'high') . I've tested many different kernels, thinking it could be introduced recently in 5.x, but 4.19, 5.0, 5.1 and 5.2 doesn't make any difference. Even tried setting different voltages to 'pp_od_clk_voltage', lower and higher than default.

    My system mainboard is ASUS Prime B350A with Ryzen 1600X, 2x 8GB Crucial 2666mhz memory, all factory default in BIOS.

    Is there anything I can do? I'd accept that the card is simply faulty, but why is it rock stable on Windows in that case? It's running at the highest clocks for hours without problems, so it can't be a power / card issue I would say?

    Thanks for any suggestions!

    - Joost

  • #2
    Try playing around with PCIe settings, eg disabling aspm or if your bios allows it, set PCIe gen 2. Try a different PCIe slot or re-seat the CPU. Or try updating the motherboard BIOS.

    I have a very similar setup with an Athlon 200GE, Asus Prime B350M-E, and an HP OEM RX 460 (with RX 560 vBIOS installed for all 16cu unlocked.) It works flawlessly and can even overclock the 460 to 1400 core, 2200 mem using the powerplay files in /sys

    edit: What kind of crash are you experiencing? full PC reset? black screen? kernel panic? a dmesg snippet may be helpful here
    Last edited by esmth; 07-08-2019, 01:21 PM.

    Comment


    • #3
      Originally posted by esmth View Post
      Try playing around with PCIe settings, eg disabling aspm or if your bios allows it, set PCIe gen 2. Try a different PCIe slot or re-seat the CPU. Or try updating the motherboard BIOS.

      I have a very similar setup with an Athlon 200GE, Asus Prime B350M-E, and an HP OEM RX 460 (with RX 560 vBIOS installed for all 16cu unlocked.) It works flawlessly and can even overclock the 460 to 1400 core, 2200 mem using the powerplay files in /sys

      edit: What kind of crash are you experiencing? full PC reset? black screen? kernel panic? a dmesg snippet may be helpful here
      Thanks for your response. Unfortunately this board only has 1 PCI-e 16x slot. Latest BIOS is already installed. I tried amdgpu.aspm=0 as per your suggestion, but unfortunately the same happens.

      Interesting to see that when my computer has been off for a for hours and then directly running glmark2 after booting up, it will keep going for about 2 minutes before the displays will turn off. All tests afterwards will fail within seconds after restarting the system and running glmark2. You would normally assume temperature related issues, but the GPU doesn't even reach 40 degrees Celsius when shutting off.

      The system sometimes remains responsible, as in that I can ssh into it and read out dmesg. But many times all communications are lost and even the power to my USB keyboard and mouse will shut off. Anyway, here is the dmesg captured when I was able to keep logged in on ssh: https://pastebin.com/8YnJiart
      You'll see I've amd_iommu=fullflush iommu=pt added as boot parameters as suggested elsewhere, but this is of no influence, same result without it.

      I would be happy to just run it with default clocks or even a bit lower, but I do think something else is the cause (since, as mentioned in OP, on Windows all is fine with default clock speeds and under load for hours).

      Comment


      • #4
        From the last bit of lines in your dmesg, amdgpu seems to be complaining about power-gating. Maybe you could try to disable that with amdgpu.pg_mask=0 or the similar clock gating flag with amdgpu.cg_mask=0. (pulled these from https://www.kernel.org/doc/html/latest/gpu/amdgpu.html)

        Another suggestion would be to try disabling the IOMMU in the BIOS. In my case I have to keep it disabled as it screws with TRIM to my NVMe drive (which is a PCIe device)

        Super odd issue though, I hope you get it figured out. Have you tried like an older linux live media like ubuntu 17.04/17.10 ?

        Comment


        • #5
          I've tested it with amdgpu.pg_mark=0 and/or amdgpu.cg_mark=0 (four different combinations), but unfortunately still same failure. Disabled IOMMU, no difference. Interesting suggestion that older release and as I'm currently writing this response, I'm on Ubuntu 17.10 live USB. At first I thought it indeed does work better, no crashes on Youtube or other simple things, so that seemed hopeful! However, as soon as I managed to install glmark2, started it -> immediate crash (as in, displays off and no response whatsoever of the system anymore). Boo, really getting desperate.

          It seems so strange that Windows doesn't have any issues at all, that makes it so frustrating. If it just would crash as well under Windows, at least I known it would be some hardware compatibility issue or just a defect card. Anyway, I keep open for any other suggestions. Thanks for your time.

          Comment


          • #6
            I can't seem to edit my posts (?), so a new one. For the sake of completeness here a dmesg of this Ubuntu 17.10: https://pastebin.com/PbFsJ05T
            What I do find interesting are the warning messages of powerplay VDDCI is larger than max. Of course these are probably solved in newer releases since I didn't get them there, but since powerplay is also the only error I'm able to pull from dmesg on a crash, I was just wondering if something has to do with that specific functionality.

            Comment


            • #7
              Little update. Unfortunately no solution found, but I tried the card in an old PC I had lying around and guess what... No not working either ;-) Same happens, almost instant crash on starting glmark2 and when booting with amdgpu.dpm=0 (which makes the card utterly slow) all is fine. So I guess I'm gonna return this card, since my primary usage is Linux. This card replaces a GT 710, which is a really really slow stupid card, but currently on par with the RX 560 set to lowest clock speeds (dpm=0). And I really want an AMD for it's open source drivers, so fingers crossed the next one will work fine! Will let it know here anyway.

              Comment


              • #8
                Originally posted by jvandenbroek View Post
                I've tested it with amdgpu.pg_mark=0 and/or amdgpu.cg_mark=0
                Just checking for a possible typo - pg/cg_mark vs pg/cg_mask.

                Comment


                • #9
                  If you do end up choosing a new 560, try to get one with the AMD reference PCB. I have never had any driver problems with their reference cards.

                  (from firepro w7000, to r9 290x, now to a rx 460) (OEM RX 460's are going for $60 shipped on ebay!)

                  Comment


                  • #10
                    Originally posted by bridgman View Post

                    Just checking for a possible typo - pg/cg_mark vs pg/cg_mask.
                    Thanks for mentioning, this was indeed a typo in this post. I entered it correctly during testing. But just to be sure I retested it, no difference.

                    Comment

                    Working...
                    X