File bugs here: https://bugs.freedesktop.org
Announcement
Collapse
No announcement yet.
AMD devs: *ERROR* ring gfx timeout
Collapse
X
-
Originally posted by agd5f View PostFile bugs here: https://bugs.freedesktop.org
As far as this issue goes I haven't had much time to mess with this weekend. I added the amdgpu.gpu_recovery=1 to my kernel parameters and the desktop has not locked up on me since. Still getting amdgpu errors in the kernel that are bad enough to set arbt traps but the system keeps going. MCE is also flagging some hardware issues but from what I understood mcelog doesn't work on Ryzen? Could be wrong there. Journalctl just shows it complaining about not knowing about this CPU, although on a couple of boots it set a arbt trap for a hardware issue:
Code:mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module > mcelog[1149]: CPU is unsupported
Right now this is the stuff setting off arbt and filling up my logs, a bit reluctant to fill out a bug report until I've ruled out hardware problems though. Even with this the machine has been stable since adjusting the RAM frequency to what it should have been and adding the GPU recovery option to grub.
Code:[85353.172060] WARNING: CPU: 5 PID: 1330 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe7/0x160 [amdgpu] [85353.172062] Modules linked in: ipheth md4 sha512_ssse3 sha512_generic nls_utf8 cifs ccm dns_resolver fscache rfcomm cmac bnep sunrpc vfat fat raid1 arc4 edac_mce_amd kvm_amd kvm iwlmvm mxm_wmi wmi_bmof irqbypass mac80211 crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel snd_hda_codec iwlwifi snd_hda_core btusb btrtl snd_hwdep btbcm btintel snd_seq snd_seq_device bluetooth cfg80211 snd_pcm snd_timer snd ecdh_generic sp5100_tco rfkill ccp soundcore k10temp i2c_piix4 wmi gpio_amdpt gpio_generic pcc_cpufreq acpi_cpufreq amdkfd amd_iommu_v2 amdgpu hid_logitech_hidpp chash gpu_sched drm_kms_helper igb ttm nvme dca drm crc32c_intel serio_raw nvme_core atlantic i2c_algo_bit uas usb_storage hid_logitech_dj pinctrl_amd [85353.172105] CPU: 5 PID: 1330 Comm: Xorg Tainted: G W 4.19.15-300.fc29.x86_64 #1 [85353.172107] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018 [85353.172169] RIP: 0010:generic_reg_wait+0xe7/0x160 [amdgpu] [85353.172171] Code: 44 24 58 8b 54 24 48 89 de 44 89 4c 24 08 48 8b 4c 24 50 48 c7 c7 40 af 89 c0 e8 44 c4 cd ff 83 7d 18 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 0f [85353.172172] RSP: 0018:ffffa70690bc3878 EFLAGS: 00010297 [85353.172174] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000000 [85353.172175] RDX: 0000000000000000 RSI: ffff9acc7cf56868 RDI: ffff9acc7cf56868 [85353.172176] RBP: ffff9acc55b2b080 R08: 0000000000000084 R09: 0000000000010200 [85353.172176] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000bb9 [85353.172177] R13: 0000000000004fa4 R14: 0000000000010000 R15: 0000000000000000 [85353.172179] FS: 00007f2490fa6ac0(0000) GS:ffff9acc7cf40000(0000) knlGS:0000000000000000 [85353.172180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [85353.172181] CR2: 00007f259c002158 CR3: 0000000ff1328000 CR4: 00000000003406e0 [85353.172182] Call Trace: [85353.172257] dce110_stream_encoder_dp_blank+0x12c/0x1a0 [amdgpu] [85353.172322] core_link_disable_stream+0x54/0x220 [amdgpu] [85353.172386] dce110_reset_hw_ctx_wrap+0xc1/0x1e0 [amdgpu] [85353.172451] dce110_apply_ctx_to_hw+0x45/0x650 [amdgpu] [85353.172520] ? dm_pp_apply_display_requirements+0x191/0x1a0 [amdgpu] [85353.172583] ? dce110_set_bandwidth+0x20b/0x230 [amdgpu] [85353.172646] dc_commit_state+0x2dc/0x550 [amdgpu] [85353.172716] amdgpu_dm_atomic_commit_tail+0x388/0xdb0 [amdgpu] [85353.172721] ? __wake_up_common_lock+0x89/0xc0 [85353.172725] ? _cond_resched+0x15/0x30 [85353.172727] ? wait_for_completion_timeout+0x3a/0x190 [85353.172729] ? wait_for_completion_interruptible+0x35/0x1d0 [85353.172738] commit_tail+0x3d/0x70 [drm_kms_helper] [85353.172747] drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper] [85353.172764] drm_mode_atomic_ioctl+0x81b/0x940 [drm] [85353.172768] ? unix_stream_sendmsg+0x37f/0x3b0 [85353.172785] ? drm_atomic_set_property+0x690/0x690 [drm] [85353.172798] drm_ioctl_kernel+0xa1/0xf0 [drm] [85353.172813] drm_ioctl+0x206/0x3a0 [drm] [85353.172829] ? drm_atomic_set_property+0x690/0x690 [drm] [85353.172831] ? _cond_resched+0x15/0x30 [85353.172881] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [85353.172886] do_vfs_ioctl+0xa4/0x630 [85353.172889] ksys_ioctl+0x60/0x90 [85353.172891] ? ksys_read+0x9c/0xb0 [85353.172893] __x64_sys_ioctl+0x16/0x20 [85353.172896] do_syscall_64+0x5b/0x160 [85353.172899] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [85353.172901] RIP: 0033:0x7f24914ce09b [85353.172904] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48 [85353.172905] RSP: 002b:00007ffc2d88c558 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [85353.172906] RAX: ffffffffffffffda RBX: 000055c443bcb0d0 RCX: 00007f24914ce09b [85353.172907] RDX: 00007ffc2d88c5a0 RSI: 00000000c03864bc RDI: 0000000000000016 [85353.172908] RBP: 00007ffc2d88c5a0 R08: 000055c4439b2d90 R09: 000000000000000d [85353.172909] R10: 000000000000000d R11: 0000000000003246 R12: 00000000c03864bc [85353.172910] R13: 0000000000000016 R14: 0000000000000000 R15: 000055c4434712b0 [85353.172912] ---[ end trace af3cf32b9038afa4 ]---
Comment
-
Originally posted by debianxfce View Post
You are using buggy fedora with old drivers. The mainline kernel is in the version 5.0-rc3. Use Debian distributions with the Xfce desktop and Oibaf ppa Mesa.
If you want a stable system those are both big "do nots."
Comment
-
Originally posted by debianxfce View Post
The AMD staging kernel is totally different than mainline kernels because it receives some of latest AMD patches. It uses the 4.20-rc3 kernel now so it misses a lot of other new kernel features. The AMD wip kernel receives a lot of more AMD patches and it uses the buggy 5.0-rc1 kernel now.
I've used both mainline and staging with no issues so it's a moot regards that point anyway
I was pointing out the fact that I don't get them issues bar the one mentioned
Comment
-
I still have the same issue on Vega 56. Problem most likely is related to mclk level 0 switching on Linux, as the GPU is perfectly stable on Windows.
At least Vega is affected by the issue, but looks like that at least some Navi and Polaris cards are affected by it too. I really hope someone can do something about, as random crashes are annoying af.
Comment
Comment