Announcement
Collapse
No announcement yet.
AMDGPU Reset Recovery To Be Flipped On By Default For Newer Radeon GPUs
Collapse
X
-
Originally posted by Med_ View Post
It would be worth a try. How do you alter that with AMDGPU?
Code:#!/bin/bash cardpath=$(eval echo "/sys/bus/pci/devices/0000:03:00.0/hwmon/hwmon*") echo "$cardpath" temp=0; while true do temp=$(cat "$cardpath/temp1_input") temp=$((60+((temp/1000)-30)*3)) echo "$temp" > "$cardpath/pwm1" sleep 0.5 done
(it won't work well if your card goes under 10°C but this is unlikely)
- Likes 1
Comment
-
Originally posted by debianxfce View PostI did laugh when I saw GPU reset patches for the intel gpus. The AMD drm-next-4.21-wip kernel started to hang the system when waking up from monitor blanking and sleeping after 30.9.2018 (4.19.rc5->rc6). The gpu reset patch has no effect with RX560 and system must be rebooted with the power button. My distribution has latest wip kernel available with Synaptic.
(this gives you the advantage to bisect with further ease and do a bug report though)
- Likes 3
Comment
-
[140122.327535] WARNING: CPU: 7 PID: 31433 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:845 dcn10_verify_allow_pstate_ch
ange_high+0x25/0x240 [amdgpu]
[.....]
[140122.328573] RIP: 0010:dcn10_verify_allow_pstate_change_high+0x25/0x240 [amdgpu]
[140122.328615] Code: 00 00 00 00 00 0f 1f 44 00 00 55 53 48 8b 87 38 01 00 00 48 89 fb 48 8b b8 e0 01 00 00 e8 73 0f 01 00 84 c0 0f 85 16 02 00 0
0 <0f> 0b 80 bb b9 00 00 00 00 0f 84 07 02 00 00 48 8b 83 38 01 00 00
[140122.328714] RSP: 0018:ffffa0208c76bb78 EFLAGS: 00010246
[140122.328745] RAX: 0000000000000000 RBX: ffff91abf9ae7000 RCX: 0000000000000000
[140122.328785] RDX: 0000000000000000 RSI: ffff91ac0ebd5548 RDI: ffff91ac0ebd5548
[140122.328825] RBP: ffff91a904480000 R08: 0000000000000000 R09: 0000000000aaaaaa
[140122.328865] R10: 0000000000000000 R11: ffffa020a1248220 R12: 0000000000000001
[140122.328904] R13: ffffa0208c76bbc0 R14: ffff91a900094000 R15: 0000000000000000
[140122.328945] FS: 0000000000000000(0000) GS:ffff91ac0ebc0000(0000) knlGS:0000000000000000
[140122.328989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[140122.329022] CR2: 00007fc3fd172594 CR3: 00000003ebb16000 CR4: 00000000003406e0
[140122.329061] Call Trace:
[140122.329165] dcn10_set_bandwidth+0xad/0xc0 [amdgpu]
[140122.329271] dc_commit_state+0x420/0x550 [amdgpu]
[140122.329384] amdgpu_dm_atomic_commit_tail+0x388/0xdb0 [amdgpu]
[140122.329422] ? __wake_up_common_lock+0x89/0xc0
[140122.329451] ? _cond_resched+0x15/0x30
[140122.329475] ? wait_for_completion_timeout+0x3a/0x180
[140122.329505] ? wait_for_completion_interruptible+0x35/0x1b0
[140122.329619] ? amdgpu_dm_atomic_commit_tail+0xdb0/0xdb0 [amdgpu]
[140122.329664] commit_tail+0x3d/0x70 [drm_kms_helper]
[140122.329702] drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper]
[140122.329745] restore_fbdev_mode_atomic+0x1c4/0x1e0 [drm_kms_helper]
[140122.329790] drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
[140122.329839] drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[140122.329880] drm_fb_helper_hotplug_event.part.35+0x90/0xb0 [drm_kms_helper]
[140122.329927] drm_kms_helper_hotplug_event+0x26/0x30 [drm_kms_helper]
[140122.330046] handle_hpd_irq+0xd9/0x100 [amdgpu]
[140122.330156] dm_irq_work_func+0x4e/0x60 [amdgpu]
[140122.330186] process_one_work+0x19b/0x390
[140122.330212] worker_thread+0x30/0x370
[140122.330236] ? rescuer_thread+0x320/0x320
[140122.330261] kthread+0x112/0x130
[140122.330282] ? kthread_create_worker_on_cpu+0x70/0x70
[140122.330313] ret_from_fork+0x22/0x40
[140122.330337] ---[ end trace 44fa92c7d8e7ba0d ]---
What did AMD mean by this?
- Likes 1
Comment
-
Interesting as my Ryzen Mobile does hang randomly but infrequently. I’m not even sure it is a GPU hang though it certainly feels like it.
For those well versed in Linux what is the best way to turn this on with a new distro like Fedora 29?
The frustratingthing with the hangs is that that they will happen when doing something that is difficult to describe as complicated or GPU demanding. Basically totally random. As others have pointed out the latest kernel and Mesa do perform much better.
Comment
-
Originally posted by tildearrow View Post
That's what happens when you use a WIP (work in progress) kernel, a unstable distro and Ubuntu PPA's on Debian.
(this gives you the advantage to bisect with further ease and do a bug report though)
- Likes 2
Comment
-
I do experience crash often with the game EVERSPACE.
It append at random time, usually it take less than 1h30 to occur. I thought it was overclocked related but it seam it's the only game I crash and it take the whole system with it. Reset required. Sometime using the reset button still leave anomalies like, extra stuff that does not work properly in the OS. So when that game use to crash my system, I did use the power supply switch to "reset" instead.
I do not play that game anymore due to these crash, with is unfortunate as it was one of my favorite that have a Linux version on Steam..
GPU : RX480 @ 1500ish Mhz 1.35V (It's not the overclocking, it also append with stock clock/volt)
CPU : Ryzen 7 2700x @ 4.3Ghz (was also appening with my R7 1700 OC or not OC, but less than with the 2700x)
Kernel : usually latest, release or git, it's variable
Mesa : latest git updated regulary
The GPU is watercooled with a custom loop, it does not usually reach 65C.
No other games take my system down, at lease for now.
Comment
-
Originally posted by Med_ View PostThis is good news. I consistently get hangs with games. Typically once every few hours. I do not bother reporting them as I cannot reproduce on demand and the bug tracker is full of them with similar logs ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=145101, last emitted seq=145103). I have activated the option, we will see whether that at least prevents the power button treatment.
Comment
Comment