Announcement

Collapse
No announcement yet.

AMD Stages Latest Radeon/AMDGPU Changes For Linux 4.21 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    You could also try setting custom pstates via OC, increasing voltage a bit or setting maximum clocks as minimum for testing purposes. I also still wouldn't trust amdgpu.dc.

    Comment


    • #22
      Originally posted by muncrief View Post

      Ha! I'd love to but the Manjaro developers don't care, or at least have never acknowledged any of the bug reports I took the time to make, and the upstream developers on Bugzilla just get mad and yell at you if you file a bug about Manjaro.

      But if someone knows where I can file a bug report that someone will actually consider, I'll take the time to file one again. The error is pretty obvious, and right at the beginning of boot so there's not even a journalctl log, just a brief Xorg log. It begins with "/dev/dri/card0: failed to set DRM interface version 1.4: Permission denied"

      And by the way I'm used to fixing things myself and took all the PCI cards out of my system, tried different slots, and different GPUs, etc. Everything works fine except my R9 390 with amdgpu enabled. I of course also looked for similar errors, and they were sometimes caused by libdrm so I tried compiling it from git but everything failed exactly the same way. So it looks like something in the kernel is hosing things up.

      If I have time later I'll try amdgpu-pro but I've never gotten it to work on Manjaro, so I'm don't have a lot of hope that will help.
      Well, if you are getting a permission denied error, then it's likely a filesystem problem. Something like an executable bit or a read bit or a write bit or something. Probably not a driver issue.

      Comment


      • #23
        Originally posted by debianxfce View Post
        What you want is a stable gaming rig. I think you whiners are are dual booting beginners, you are not interested to make your Linux system stable. Whining here does help anything, inspect your Linux system and make good bug reports.
        oh my fucking god, every AMD thread there is this same idiot calling everyone a beginner.


        Originally posted by ryad View Post
        Don't pay him no mind. Despite his obvious technical competence, he unfortunately often fulfills the cliché of the unfriendly Linux geek who prefers to publish destructive insinuations rather than constructively use his knowledge for the community. Unfortunately there are too many of them.
        His only competence is spamming the forums, read a couple of his posts - he's just made of trollness and PPAs.

        Now on-topic, tried the last 4.21-wip and still have the same issues:

        1. Massive flickering @75hz unless memory clock is locked. If amdgpu.dc=1 only triggers after sleep/wake, if amdgpu.dc=0 triggers everywhere.
        2. With amdgpu.dc=1 (default for my rx580) Vsync with page-flips while moving cursor is still slow and stuttery. Works fine with amdgpu.dc=0.
        3. TearFree has similar issue as above, lags hard only when moving cursor, but this one happens with both dc=0 and dc=1.
        4. Switching fan control from auto->manual->auto starts a constant high pitched noise until reboot.

        All of those already reported by me and others for months or years.
        debianxfce Dont tell people to report amdgpu issues on kernel bugzilla. Those should be reported on freedesktop bugzilla.

        [12:45] <diwr> What's the right place for amdgpu reports, freedesktop or kernel bugzilla?
        [12:50] <MrCooper> diwr: freedesktop
        Last edited by clapbr; 15 November 2018, 11:22 AM.

        Comment


        • #24
          Originally posted by clapbr View Post
          Now on-topic, tried the last 4.21-wip and still have the same issues:

          1. Massive flickering @75hz unless memory clock is locked. If amdgpu.dc=1 only triggers after sleep/wake, if amdgpu.dc=0 triggers everywhere.
          2. With amdgpu.dc=1 (default for my rx580) Vsync with page-flips while moving cursor is still slow and stuttery. Works fine with amdgpu.dc=0.
          3. TearFree has similar issue as above, lags hard only when moving cursor, but this one happens with both dc=0 and dc=1.
          4. Switching fan control from auto->manual->auto starts a constant high pitched noise until reboot.

          All of those already reported by me and others for months or years, I kinda regret giving up the better nvidias performance expecting a better driver.
          Very interesting post, thank you! I have absolutely no issues at all with my 580 with default settings and multi-monitor setup (144hz & 75hz) on the open-source stack and kernel 4.19.

          Comment


          • #25
            Originally posted by ryad View Post
            Very interesting post, thank you! I have absolutely no issues at all with my 580 with default settings and multi-monitor setup (144hz & 75hz) on the open-source stack and kernel 4.19.
            Can you try plugging only the 75hz monitor, sleep then wake? Possibly related issues are:


            Comment


            • #26
              Sorry to keep asking, but I went digging in the logs after another hang happened today on BioShock Infinite (nothing was out of the ordinary in GALLIUM_HUD).

              Can anyone help me decipher what they think could be the issue here?

              Code:
              Nov 15 09:20:19 ubuntu kernel: [33018.629269] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.629271] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.629272] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
              Nov 15 09:20:19 ubuntu kernel: [33018.629274] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.637221] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.637224] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.637225] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
              Nov 15 09:20:19 ubuntu kernel: [33018.637227] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.645123] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.645126] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.645127] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
              Nov 15 09:20:19 ubuntu kernel: [33018.645128] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.652990] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.652993] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.652994] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
              Nov 15 09:20:19 ubuntu kernel: [33018.652996] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.661350] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.661353] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.661354] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.661355] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.670031] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.670034] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.670035] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.670037] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.678005] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.678008] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.678009] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.678011] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.686475] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.686477] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.686478] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.686480] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.696079] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.696082] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.696083] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.696084] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:20:19 ubuntu kernel: [33018.704061] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
              Nov 15 09:20:19 ubuntu kernel: [33018.704064] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
              Nov 15 09:20:19 ubuntu kernel: [33018.704065] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
              Nov 15 09:20:19 ubuntu kernel: [33018.704066] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
              Nov 15 09:22:51 ubuntu kernel: [33170.778194] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=912497, last emitted seq=912499
              Nov 15 09:22:51 ubuntu kernel: [33170.778198] amdgpu 0000:01:00.0: GPU reset begin!
              Nov 15 09:23:01 ubuntu kernel: [33181.018407] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:45:crtc-1] hw_done or flip_done timed out
              I'm currently running 4.18.19 that came out the other day (11/13) and I noticed it had an interesting commit in the changelog around amdgpu and faults (ctrl+f amdgpu to see in full).: https://cdn.kernel.org/pub/linux/ker...ngeLog-4.18.19

              Was hoping that was the one, but unfortunately still getting the hangs with the very latest 4.18.

              Also shmerl, I tried the Magic SysRq and the only one that seemed to work was Alt+SysRq+b, which is the command "Will immediately reboot the system without syncing or unmounting your disks" so it seemed very similar to just hitting the power button. Please let me know if you're using another command.

              Comment


              • #27
                Originally posted by perpetually high View Post
                I have the MSI Gaming X RX 480 8GB that has default clocks of 1303/2000 so I leave it at that for gaming with fans on auto. The temperature never gets too high, though.
                Thats an MSI factory overclock?
                On my Sapphire i got less mhz by flicking the BIOS switch on the card.

                Comment


                • #28
                  Originally posted by Etherman View Post

                  Thats an MSI factory overclock?
                  On my Sapphire i got less mhz by flicking the BIOS switch on the card.
                  Yeah it's this card right here: https://www.msi.com/Graphics-card/Ra.../Specification. I don't use the OC mode though, which is apparently 1316/2025.

                  Another thing I noticed was the voltage did seem a little low when the game was playing:

                  Code:
                  amdgpu-pci-0100
                  Adapter: PCI adapter
                  vddgfx:       +1.09 V
                  fan1:        1595 RPM
                  temp1:        +65.0°C  (crit = +94.0°C, hyst = -273.1°C)
                  power1:      147.03 W  (cap = 180.00 W)
                  I thought voltage would be more around 1.175 V, but I'm just speculating here.

                  Comment


                  • #29
                    Originally posted by perpetually high View Post
                    Sorry to keep asking, but I went digging in the logs after another hang happened today on BioShock Infinite (nothing was out of the ordinary in GALLIUM_HUD).

                    Can anyone help me decipher what they think could be the issue here?
                    A blind shot, did you test any non-steam games?

                    Comment


                    • #30
                      Ahh, it was the clocks!

                      Etherman and aufkrawall, you guys called it. Thank you for the suggestion to lower the clocks.

                      I ended up using rocm-smi to manually set sclk to level 5: $ rocm-smi --setsclk 5

                      For reference on my card:
                      Code:
                      GPU[0]         : Supported GPU clock frequencies on GPU0
                      GPU[0]         : 0: 300Mhz
                      GPU[0]         : 1: 608Mhz
                      GPU[0]         : 2: 910Mhz
                      GPU[0]         : 3: 1077Mhz
                      GPU[0]         : 4: 1145Mhz
                      GPU[0]         : 5: 1191Mhz *
                      GPU[0]         : 6: 1236Mhz
                      GPU[0]         : 7: 1303Mhz
                      I've yet to try level 6 (as I wanted to give it a decent downclock to test the theory out) or level 7 with higher voltage than it was being given.

                      Annoyed I didn't think to try this sooner. Thank you guys again. This makes sense why it's affecting certain cards and not others. For the record, I have very good system cooling and air flow, and a 700W PSU. I know my system can handle the RX 480 at full load.

                      raonlinux, would be great if you could test out this theory, too. Let me know if you need any help with rocm-smi or getting the card to downclock.

                      Comment

                      Working...
                      X