Announcement

Collapse
No announcement yet.

AMD Stages Latest Radeon/AMDGPU Changes For Linux 4.21 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • perpetually high
    replied
    Sorry to keep asking, but I went digging in the logs after another hang happened today on BioShock Infinite (nothing was out of the ordinary in GALLIUM_HUD).

    Can anyone help me decipher what they think could be the issue here?

    Code:
    Nov 15 09:20:19 ubuntu kernel: [33018.629269] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.629271] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.629272] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
    Nov 15 09:20:19 ubuntu kernel: [33018.629274] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.637221] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.637224] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.637225] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
    Nov 15 09:20:19 ubuntu kernel: [33018.637227] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.645123] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.645126] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.645127] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
    Nov 15 09:20:19 ubuntu kernel: [33018.645128] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.652990] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.652993] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.652994] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C
    Nov 15 09:20:19 ubuntu kernel: [33018.652996] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.661350] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.661353] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.661354] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.661355] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.670031] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.670034] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.670035] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.670037] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.678005] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.678008] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.678009] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.678011] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.686475] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.686477] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.686478] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.686480] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.696079] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.696082] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.696083] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.696084] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:20:19 ubuntu kernel: [33018.704061] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
    Nov 15 09:20:19 ubuntu kernel: [33018.704064] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    Nov 15 09:20:19 ubuntu kernel: [33018.704065] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
    Nov 15 09:20:19 ubuntu kernel: [33018.704066] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32781) at page 0, read from 'TC4' (0x54433400) (72)
    Nov 15 09:22:51 ubuntu kernel: [33170.778194] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=912497, last emitted seq=912499
    Nov 15 09:22:51 ubuntu kernel: [33170.778198] amdgpu 0000:01:00.0: GPU reset begin!
    Nov 15 09:23:01 ubuntu kernel: [33181.018407] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:45:crtc-1] hw_done or flip_done timed out
    I'm currently running 4.18.19 that came out the other day (11/13) and I noticed it had an interesting commit in the changelog around amdgpu and faults (ctrl+f amdgpu to see in full).: https://cdn.kernel.org/pub/linux/ker...ngeLog-4.18.19

    Was hoping that was the one, but unfortunately still getting the hangs with the very latest 4.18.

    Also shmerl, I tried the Magic SysRq and the only one that seemed to work was Alt+SysRq+b, which is the command "Will immediately reboot the system without syncing or unmounting your disks" so it seemed very similar to just hitting the power button. Please let me know if you're using another command.

    Leave a comment:


  • clapbr
    replied
    Originally posted by ryad View Post
    Very interesting post, thank you! I have absolutely no issues at all with my 580 with default settings and multi-monitor setup (144hz & 75hz) on the open-source stack and kernel 4.19.
    Can you try plugging only the 75hz monitor, sleep then wake? Possibly related issues are:


    Leave a comment:


  • ryad
    replied
    Originally posted by clapbr View Post
    Now on-topic, tried the last 4.21-wip and still have the same issues:

    1. Massive flickering @75hz unless memory clock is locked. If amdgpu.dc=1 only triggers after sleep/wake, if amdgpu.dc=0 triggers everywhere.
    2. With amdgpu.dc=1 (default for my rx580) Vsync with page-flips while moving cursor is still slow and stuttery. Works fine with amdgpu.dc=0.
    3. TearFree has similar issue as above, lags hard only when moving cursor, but this one happens with both dc=0 and dc=1.
    4. Switching fan control from auto->manual->auto starts a constant high pitched noise until reboot.

    All of those already reported by me and others for months or years, I kinda regret giving up the better nvidias performance expecting a better driver.
    Very interesting post, thank you! I have absolutely no issues at all with my 580 with default settings and multi-monitor setup (144hz & 75hz) on the open-source stack and kernel 4.19.

    Leave a comment:


  • clapbr
    replied
    Originally posted by debianxfce View Post
    What you want is a stable gaming rig. I think you whiners are are dual booting beginners, you are not interested to make your Linux system stable. Whining here does help anything, inspect your Linux system and make good bug reports.
    oh my fucking god, every AMD thread there is this same idiot calling everyone a beginner.


    Originally posted by ryad View Post
    Don't pay him no mind. Despite his obvious technical competence, he unfortunately often fulfills the cliché of the unfriendly Linux geek who prefers to publish destructive insinuations rather than constructively use his knowledge for the community. Unfortunately there are too many of them.
    His only competence is spamming the forums, read a couple of his posts - he's just made of trollness and PPAs.

    Now on-topic, tried the last 4.21-wip and still have the same issues:

    1. Massive flickering @75hz unless memory clock is locked. If amdgpu.dc=1 only triggers after sleep/wake, if amdgpu.dc=0 triggers everywhere.
    2. With amdgpu.dc=1 (default for my rx580) Vsync with page-flips while moving cursor is still slow and stuttery. Works fine with amdgpu.dc=0.
    3. TearFree has similar issue as above, lags hard only when moving cursor, but this one happens with both dc=0 and dc=1.
    4. Switching fan control from auto->manual->auto starts a constant high pitched noise until reboot.

    All of those already reported by me and others for months or years.
    debianxfce Dont tell people to report amdgpu issues on kernel bugzilla. Those should be reported on freedesktop bugzilla.

    [12:45] <diwr> What's the right place for amdgpu reports, freedesktop or kernel bugzilla?
    [12:50] <MrCooper> diwr: freedesktop
    Last edited by clapbr; 15 November 2018, 11:22 AM.

    Leave a comment:


  • duby229
    replied
    Originally posted by muncrief View Post

    Ha! I'd love to but the Manjaro developers don't care, or at least have never acknowledged any of the bug reports I took the time to make, and the upstream developers on Bugzilla just get mad and yell at you if you file a bug about Manjaro.

    But if someone knows where I can file a bug report that someone will actually consider, I'll take the time to file one again. The error is pretty obvious, and right at the beginning of boot so there's not even a journalctl log, just a brief Xorg log. It begins with "/dev/dri/card0: failed to set DRM interface version 1.4: Permission denied"

    And by the way I'm used to fixing things myself and took all the PCI cards out of my system, tried different slots, and different GPUs, etc. Everything works fine except my R9 390 with amdgpu enabled. I of course also looked for similar errors, and they were sometimes caused by libdrm so I tried compiling it from git but everything failed exactly the same way. So it looks like something in the kernel is hosing things up.

    If I have time later I'll try amdgpu-pro but I've never gotten it to work on Manjaro, so I'm don't have a lot of hope that will help.
    Well, if you are getting a permission denied error, then it's likely a filesystem problem. Something like an executable bit or a read bit or a write bit or something. Probably not a driver issue.

    Leave a comment:


  • aufkrawall
    replied
    You could also try setting custom pstates via OC, increasing voltage a bit or setting maximum clocks as minimum for testing purposes. I also still wouldn't trust amdgpu.dc.

    Leave a comment:


  • ryad
    replied
    Originally posted by perpetually high View Post
    Quit the bullshit, debianxfce. I already mentioned I had everything on stock clocks to isolate the problem. Not to mention I've tested my system with prime95 and memtest86 to double check. Just because it's not happening to you doesn't mean it's not possible for it to happen to others.
    Don't pay him no mind. Despite his obvious technical competence, he unfortunately often fulfills the cliché of the unfriendly Linux geek who prefers to publish destructive insinuations rather than constructively use his knowledge for the community. Unfortunately there are too many of them.

    In regard to your problems: Have you tried a DisplayPort cable? Have you activated memory XMP (DOCP on Asus boards)? I had similar issues on my machine with the XMP profile deactivated. BR

    Leave a comment:


  • perpetually high
    replied
    Originally posted by debianxfce View Post
    What you want is a stable gaming rig. I think you whiners are are dual booting beginners, you are not interested to make your Linux system stable. Whining here does help anything, inspect your Linux system and make good bug reports.
    Quit the bullshit, debianxfce. I already mentioned I had everything on stock clocks to isolate the problem. Not to mention I've tested my system with prime95 and memtest86 to double check. Just because it's not happening to you doesn't mean it's not possible for it to happen to others.

    Leave a comment:


  • TemplarGR
    replied
    Please AMD just add proper fan RPM controls in the next batch.... The pmw sysfs interface is broken for my Tonga and my fan is either on 150ish RPM or 3500, nothing in between. So even on idle (seems it uses too much power on idle) i get to periodically hear a loud fan for a few seconds (i have set it to increase the fan after 70celcius, the auto setting let the gpu reach 100 before increasing it).

    Leave a comment:


  • TheVE
    replied
    I also have an MSI Gaming X RX480 and get random GPU hangs. Sometimes it will be a week Sometimes only an hour. But they always come. DMESG always shows it's a GPU recovery problem. I'm really hoping it gets fixed soon.

    Leave a comment:

Working...
X