Announcement

Collapse
No announcement yet.

Fedora 32 and AMDGPU related freezing/crashing.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fedora 32 and AMDGPU related freezing/crashing.

    Hi Folks, I'm a long time reader and first time poster. I've been running Linux on and off for a while due to the constant churn over support for the 5700xt. I've finally settled on Fedora 32 (beta) as it had the latest Mesa and Kernels build "from the factory". I've had lots of ups and downs (mostly downs) using various COPRs.

    Recently in the last week, I've been experiencing random hangs/black screens that appear to be GPU related. They usually occur while playing a game (World of Warships, via Steam Play using ACO). Most recently, i was able to grab the dmesg output, and found this:
    Code:
    20239.249287] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
    [20244.369125] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1438474, emitted seq=1438476
    [20244.369202] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process World pid 16584 thread World:cs0 pid 16624
    [20244.369208] amdgpu 0000:08:00.0: GPU reset begin!
    [20244.794428] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
    [20245.041624] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
    [20245.288805] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
    [20248.337050] amdgpu 0000:08:00.0: GPU reset succeeded, trying to resume
    [20248.337200] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
    [20248.341224] [drm] PSP is resuming...
    [20248.510944] [drm] reserve 0xa00000 from 0x81fe400000 for PSP TMR
    [20248.579935] amdgpu 0000:08:00.0: RAS: ras ta ucode is not available
    [20248.585938] amdgpu: [powerplay] SMU is resuming...
    [20248.587871] amdgpu: [powerplay] SMU is resumed successfully!
    [20248.817872] [drm] kiq ring mec 2 pipe 1 q 0
    [20248.821657] [drm] VCN decode and encode initialized successfully(under DPG Mode).
    [20248.821743] [drm] JPEG decode initialized successfully.
    [20248.821747] amdgpu 0000:08:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
    [20248.821748] amdgpu 0000:08:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
    [20248.821749] amdgpu 0000:08:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
    [20248.821750] amdgpu 0000:08:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
    [20248.821751] amdgpu 0000:08:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
    [20248.821752] amdgpu 0000:08:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
    [20248.821753] amdgpu 0000:08:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
    [20248.821754] amdgpu 0000:08:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
    [20248.821755] amdgpu 0000:08:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
    [20248.821756] amdgpu 0000:08:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
    [20248.821757] amdgpu 0000:08:00.0: ring sdma0 uses VM inv eng 12 on hub 0
    [20248.821757] amdgpu 0000:08:00.0: ring sdma1 uses VM inv eng 13 on hub 0
    [20248.821758] amdgpu 0000:08:00.0: ring vcn_dec uses VM inv eng 0 on hub 1
    [20248.821759] amdgpu 0000:08:00.0: ring vcn_enc0 uses VM inv eng 1 on hub 1
    [20248.821760] amdgpu 0000:08:00.0: ring vcn_enc1 uses VM inv eng 4 on hub 1
    [20248.821761] amdgpu 0000:08:00.0: ring jpeg_dec uses VM inv eng 5 on hub 1
    [20248.824906] [drm] recover vram bo from shadow start
    [20248.831830] [drm] recover vram bo from shadow done
    [20248.831832] [drm] Skip scheduling IBs!
    [20248.831833] [drm] Skip scheduling IBs!
    [20248.831855] amdgpu 0000:08:00.0: GPU reset(1) succeeded!
    [20248.831866] [drm] Skip scheduling IBs!
    [20248.831876] [drm] Skip scheduling IBs!
    [20248.831878] [drm] Skip scheduling IBs!
    [20248.831884] [drm] Skip scheduling IBs!
    [20248.831886] [drm] Skip scheduling IBs!
    [20248.831888] [drm] Skip scheduling IBs!
    [20248.831890] [drm] Skip scheduling IBs!
    [20248.831891] [drm] Skip scheduling IBs!
    [20248.831893] [drm] Skip scheduling IBs!
    [20248.831894] [drm] Skip scheduling IBs!
    [20248.831896] [drm] Skip scheduling IBs!
    [20248.831902] [drm] Skip scheduling IBs!
    [20248.831904] [drm] Skip scheduling IBs!
    [20248.831906] [drm] Skip scheduling IBs!
    [20248.831908] [drm] Skip scheduling IBs!
    [20248.831910] [drm] Skip scheduling IBs!
    [20248.831911] [drm] Skip scheduling IBs!
    [20248.831912] [drm] Skip scheduling IBs!
    [20248.831912] [drm] Skip scheduling IBs!
    [20248.831913] [drm] Skip scheduling IBs!
    [20248.831914] [drm] Skip scheduling IBs!
    [20248.831915] [drm] Skip scheduling IBs!
    [20248.831920] [drm] Skip scheduling IBs!
    [20248.831921] [drm] Skip scheduling IBs!
    [20248.831923] [drm] Skip scheduling IBs!
    [20248.831926] [drm] Skip scheduling IBs!
    [20248.831928] [drm] Skip scheduling IBs!
    [20248.831930] [drm] Skip scheduling IBs!
    [20248.832415] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
    [20269.941296] show_signal_msg: 4 callbacks suppressed
    [20269.941298] GpuWatchdog[8909]: segfault at 0 ip 00007fb5eafb327d sp 00007fb5d3617550 error 6 in libcef.so[7fb5e722d000+69a4000]
    [20269.941308] Code: 00 79 09 48 8b 7d a0 e8 01 80 c1 02 41 8b 85 00 01 00 00 85 c0 0f 84 ab 00 00 00 49 8b 45 00 4c 89 ef be 01 00 00 00 ff 50 58 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 c1 a5 37 03 01 80 bd 7f ff
    [20273.098921] gnome-shell[3734]: segfault at 7f2578294000 ip 00007f25a4ed85a0 sp 00007ffcb4438b58 error 6 in libc-2.31.so[7f25a4d98000+150000]
    [20273.098929] Code: 80 fa 10 73 17 80 fa 08 73 27 80 fa 04 73 33 80 fa 01 77 3b 72 05 0f b6 0e 88 0f c3 c5 fa 6f 06 c5 fa 6f 4c 16 f0 c5 fa 7f 07 <c5> fa 7f 4c 17 f0 c3 48 8b 4c 16 f8 48 8b 36 48 89 4c 17 f8 48 89
    It seems that the amdgpu driver is crashing. The 5.5 kernels were working well, and I think 5.6 started out ok too.

    Can someone help point me in the right direction and maybe help isolate the cause so that a proper bug report can be submitted? Maybe @ag5df would be able to assist?

    Some system details are:

    Kernel: 5.6.5-300.fc32.x86_64 x86_64 bits: 64 Desktop: Gnome 3.36.1 Distro: Fedora release 32 (Thirty Two)
    Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX B450-I GAMING v: Rev 1.xx UEFI: American Megatrends v: 3004 date: 12/16/2019
    CPU: 6-Core: AMD Ryzen 5 3600 type: MT MCP speed: 1759 MHz min/max: 2200/3600 MHz
    Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] driver: amdgpu v: kernel
    Display: wayland server: Fedora Project X.org 1.20.8 driver: amdgpu resolution: 1920x1080~75Hz, 1920x1080~60Hz
    OpenGL: renderer: AMD NAVI10 (DRM 3.36.0 5.6.5-300.fc32.x86_64 LLVM 10.0.0) v: 4.6 Mesa 20.0.4
    Network: Device-1: Intel I211 Gigabit Network driver: igb
    Drives: Local Storage: total: 953.87 GiB used: 416.62 GiB (43.7%)

    Thanks for the help!

  • #2
    The GPU has reset due to a hang. You'll need to restart your desktop environment because at the moment, no desktop managers properly handle the loss of a GPU context. Someone would need to wire that up in each desktop environment.

    Comment


    • #3
      I've seen this on various versions from 5.2 to 5.6. Same tons of "skip scheduling IBs!" then kernel crash.

      Comment

      Working...
      X