Is RX 5500 XT supposed to work on linux-5.4.7+?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bitnick
    Phoronix Member
    • Jul 2007
    • 97

    Is RX 5500 XT supposed to work on linux-5.4.7+?

    So I tried to use my RX 5500 XT for the first time today. Kernel updated to 5.4.8, fresh navi14 firmware files fetched from the linux-firmware repo, and of course power cable connected to the card.

    First I just got beep codes from my BIOS, but it turned out that it had "helpfully" decided to automatically switch to the iGPU setting for primary graphics (despite not being booted up without a discrete GPU in place).

    Anyway, after switching BIOS setting to use PCIE device as primary graphics card again, I can now get into the BIOS. But it flickers! On and off, a bit at random, like a broken fluorescent tube. My RX 560 that used the exact same cable minutes earlier doesn't have that problem. What's up with that?

    Booting the kernel hangs when the graphics are initialized (on two different OS:es: Ubuntu 18.04 and Gentoo, both with updated kernels and firmware). There's nothing on the screen except sometimes a frozen cursor - I don't get any error messages.

    So I tried to boot with the iGPU as primary with the RX 5500 XT still in the box. This works, the card is detected by the kernel, firmware loads etc, and it shows up in xrandr. But when enabling the output (something like 'xrandr --output DisplayPort-1-2 --preferred --same-as HDMI-1') the computer "half-freezes", as in, I get a little bit of reaction every 10 seconds or so - caps lock change, a kernel log line, etc. Mouse cursor moves but system does not react to clicks.

    Kernel logs at this time looks like this (copied by hand from photo so may contain errors):

    kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
    kernel: Asynchronous wait on fence drm_sched:gfx_0.0.0:11 timed out (hint:submit_notify+0x0/0x80 [i915])
    kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
    kernel: Asynchronous wait on fence drm_sched:gfx_0.0.0:12 timed out (hint:submit_notify+0x0/0x80 [i915])

    etc... (a new pair of lines appear about every 11 seconds).

    This is just insane - how can they release drivers that hangs the computer like this?! Okay if there's no support for a new card, but acting up like this? And how come it isn't fixed ASAP with maintenance releases of the kernel?? Shame on you AMD (the company, not the developers)!

    Where do I report this bug? - Edit: reported to [email protected]. I hope this is the right place.

    Anything I can do to get this working?
    Last edited by bitnick; 07 January 2020, 12:28 AM.
  • bitnick
    Phoronix Member
    • Jul 2007
    • 97

    #2
    I very quickly got a reply from Alex Deucher (within an hour!) asking me to try the 5.5 kernel. I installed linux-5.5-rc5 and it seems to have fixed the problem. I don't like running unstable kernels, but it seems I don't have a choice.

    One great improvement is that fan control finally seems to be working properly (at least on my RX 5500 XT) with linux-5.5! That is, the fans actually stop by default when the card is idle. Very nice!

    Another very cool feature of the FLOSS drivers (that I was pretty much forced to try out) is PRIME - using multiple graphics cards (hybrid graphics), and making it possible to specify which card to use for 3D acceleration even if it's a different device that has the monitor connected! Impressive!

    ---

    It's sad that AMD does not make sure their hardware works at least on the latest maintenance release of the latest stable kernel before launch. That's the least one could expect, right? But this seems to be a problem in general with AMD's drivers - I seem to recall reading about problems with newer CPU's/APU's as well?

    Intel can do it, so why not AMD? But at least they're way ahead of nVIDIA...

    Comment

    • theboomboomcars
      Junior Member
      • Sep 2007
      • 10

      #3
      Originally posted by bitnick View Post
      I very quickly got a reply from Alex Deucher (within an hour!) asking me to try the 5.5 kernel. I installed linux-5.5-rc5 and it seems to have fixed the problem. I don't like running unstable kernels, but it seems I don't have a choice.

      One great improvement is that fan control finally seems to be working properly (at least on my RX 5500 XT) with linux-5.5! That is, the fans actually stop by default when the card is idle. Very nice!

      Another very cool feature of the FLOSS drivers (that I was pretty much forced to try out) is PRIME - using multiple graphics cards (hybrid graphics), and making it possible to specify which card to use for 3D acceleration even if it's a different device that has the monitor connected! Impressive!

      ---

      It's sad that AMD does not make sure their hardware works at least on the latest maintenance release of the latest stable kernel before launch. That's the least one could expect, right? But this seems to be a problem in general with AMD's drivers - I seem to recall reading about problems with newer CPU's/APU's as well?

      Intel can do it, so why not AMD? But at least they're way ahead of nVIDIA...
      Which card did you get? The lowest my fans go on my 5500 XT is 1200 RPM, not terribly loud but it'd be nice if they'd shut off. What distro are you using? I am on Manjaro and I have to boot with iommu=soft on the kernel line other wise I get the time out errors and some iommu errors that don't exist with my RX460.

      Comment

      • bitnick
        Phoronix Member
        • Jul 2007
        • 97

        #4
        It's a Sapphire Pulse RX 5500 XT 4 GB, SKU# 11295-03-20G. Distro is Xubuntu-18.04. I'm using ukuu (Ubuntu Kernel Update Utility?) to install the 5.5-rc5 kernel. And I'm using Oibaf PPA for mesa-20.0-git. I installed the firmware files mentioned in the first post manually, using wget, and put them under /lib/firmware/amdgpu.

        The card has a dual BIOS switch to select between factory and overclocked settings. I read somewhere that overclocking tends to trigger timeout errors, so I have it set to factory clocks (and also I don't care for the 10 % extra heat that one has to pay for the ~1 % performance gained by overclocking).

        The only thing I've used the card for so far is running the Deus Ex Phoronix benchmark, but so far so good, no hangs or timeouts.

        Comment

        • agd5f
          AMD Graphics Driver Developer
          • Dec 2007
          • 3939

          #5
          Originally posted by bitnick View Post
          It's sad that AMD does not make sure their hardware works at least on the latest maintenance release of the latest stable kernel before launch.
          We specifically tested navi14 on 5.4 before I removed the experimental flags and it was working well (otherwise I wouldn't removed the experimental flag). It only came up later that some AIB boards don't work properly with that code. We are looking into what is missing to fix 5.4. We generally only test reference boards internally (since we don't usually have access to AIB boards prior to launch). AIBs usually only test windows.

          As to getting the code upstream in time for the latest stable kernel release, that really depends on schedules. Generally the kernel schedules do not align with product cycles at all and because the kernel doesn't allow new features in old kernels, you end up landing where you land. As to the potential for adjusting schedules to align with Linux, there's not much room for that. Schedules are generally driven by hw and fabs. It's a little easier for APUs because the cycles are longer compared to dGPUs and a fair number of OEMs actually do Linux preloads.

          Comment

          • bitnick
            Phoronix Member
            • Jul 2007
            • 97

            #6
            Thank you for your reply agd5f, and for the explanation of what happened.

            It sounds like AMD needs to bring its AIB partners into the loop then! Those are the cards that the end user will run the drivers on after all. (This is what the "Shame on you, AMD!" comment above is about - AMD should have resources and protocols in place already to make sure such tests are carried out!)

            Maybe as a first step ask the partners to run through a set from the Phoronix Test Suite on each card? Any failed tests (or boot failures ) should at least give you a heads-up that something's wrong. And getting tests run on lots of different hardware, by different people in different environments before release should do wonders for getting bugs ironed out?


            I'm glad you're looking into getting the fixes into 5.4. I hope that works out. Is there a bug report somewhere that I can follow?


            As an aside, from a geeky perspective it would be interesting to know what kind of change they made that triggered this... probably something esoteric that caused some subtle timing change that triggers some previously undetected locking error or such? At least it's very reliably repeatable with the Sapphire card!

            Comment

            • bitnick
              Phoronix Member
              • Jul 2007
              • 97

              #7
              I've played a couple of hours of The Long Dark now and everything seems fine so far. Rock-steady 60 frames/s at max quality settings and 1920x1200 pixels. And almost best of all: it's quiet! My Sapphire RX 560 card sounded a bit like a hair-blower after awhile. So I'm very happy with the performance.

              Now I only need to get my Gentoo OS online too, with the new kernel... (I'm dual-booting).

              Comment

              • agd5f
                AMD Graphics Driver Developer
                • Dec 2007
                • 3939

                #8
                Well, unfortunately, Linux is only like 1% (you can debate the actual number, but it's low) of the market for consumer dGPUs. It's hard to get traction to add additional overhead unfortunately.

                Comment

                • theboomboomcars
                  Junior Member
                  • Sep 2007
                  • 10

                  #9
                  I have the MSI Mech 8 OC, so I wonder if the overclock on it is what is causing my issues...

                  If I boot without iommu=soft in the kernel line i get these iommu errors ( which are not present with iommu=soft, or with my rx460:

                  Code:
                  Jan 07 13:56:15 samticus-compuman kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0b:00.0 address=0x40d1849a0]
                  Jan 07 13:56:15 samticus-compuman kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0b:00.0 address=0x40d1849c0]
                  Jan 07 13:56:15 samticus-compuman kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
                  Jan 07 13:56:15 samticus-compuman kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0b:00.0 address=0x40d1849f0]
                  Jan 07 13:56:15 samticus-compuman kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0b:00.0 address=0x40d184a30]
                  Jan 07 13:56:15 samticus-compuman kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0b:00.0 address=0x40d184a50]
                  and these amdgpu errors:

                  Code:
                  Jan 07 13:56:15 samticus-compuman kernel: [drm] amdgpu kernel modesetting enabled.
                  Jan 07 13:56:15 samticus-compuman kernel: fb0: switching to amdgpudrmfb from EFI VGA
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: vgaarb: deactivate vga console
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: No more image in the PCI ROM
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
                  Jan 07 13:56:15 samticus-compuman kernel: [drm] amdgpu: 8176M of VRAM memory ready
                  Jan 07 13:56:15 samticus-compuman kernel: [drm] amdgpu: 8176M of GTT memory ready.
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu: [powerplay] use vbios provided pptable
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu: [powerplay] failed send message:     RunBtc (58)         param: 0x00000000 response 0xffffffc2
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu: [powerplay] RunBtc failed!
                  Jan 07 13:56:15 samticus-compuman kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <smu> failed -62
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: amdgpu_device_ip_init failed
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu 0000:0b:00.0: Fatal error during GPU init
                  Jan 07 13:56:15 samticus-compuman kernel: [drm] amdgpu: finishing device.
                  Jan 07 13:56:15 samticus-compuman kernel: Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_vram_mgr_fini+0x2d/0xb0 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_ttm_fini+0x89/0xe0 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_bo_fini+0xe/0x30 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  gmc_v10_0_sw_fini+0x2e/0x40 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_device_fini+0x25b/0x475 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_driver_unload_kms+0x4a/0x90 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_driver_load_kms.cold+0x39/0x5b [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_pci_probe+0xec/0x150 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel: [drm] amdgpu: ttm finalized
                  Jan 07 13:56:15 samticus-compuman kernel: Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_device_fini+0x441/0x475 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_driver_unload_kms+0x4a/0x90 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_driver_load_kms.cold+0x39/0x5b [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel:  amdgpu_pci_probe+0xec/0x150 [amdgpu]
                  Jan 07 13:56:15 samticus-compuman kernel: amdgpu: probe of 0000:0b:00.0 failed with error -62
                  I can access the system via ssh, but it won't switch to a different virtual terminal nor can I boot into CLI mode. But with iommu=soft it boots and runs well. I can run Shadow of Mordor at 4K medium settings @ 60 fps.

                  Comment

                  • agd5f
                    AMD Graphics Driver Developer
                    • Dec 2007
                    • 3939

                    #10
                    Does pci=noats fix the issue? If so you are seeing an issue with some early vbioses that did not set up ATS. It was supposedly fixed before vbios went to partners. The problem stems from the fact the on Linux the IOMMU drivers enable ATS before the device drivers have loaded so there is no way for drivers to set it up properly before ATS is enabled. If you were to load the GPU driver before the IOMMU driver you wouldn't see this because the driver would have set up ATS properly. Windows doesn't use ATS so it's not an issue there. Please file a ticket on gitlab (https://gitlab.freedesktop.org/drm/amd/issues) and attach your dmesg output and I can generate a fix for you to test.

                    Comment

                    Working...
                    X