Announcement

Collapse
No announcement yet.

AMDGPU Reset Recovery To Be Flipped On By Default For Newer Radeon GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Space Heater
    replied
    Originally posted by xiando View Post
    What did AMD mean by this?
    If you haven't already, submit a bug report and keep it updated.

    Leave a comment:


  • TemplarGR
    replied
    Originally posted by tildearrow View Post

    That's what happens when you use a WIP (work in progress) kernel, a unstable distro and Ubuntu PPA's on Debian.

    (this gives you the advantage to bisect with further ease and do a bug report though)
    LOL. You are responding to a troll who advocates to casual users to use WIP kernels/mesa because supposedly stable software is incomplete.

    Leave a comment:


  • wizard69
    replied
    Interesting as my Ryzen Mobile does hang randomly but infrequently. I’m not even sure it is a GPU hang though it certainly feels like it.

    For those well versed in Linux what is the best way to turn this on with a new distro like Fedora 29?

    The frustratingthing with the hangs is that that they will happen when doing something that is difficult to describe as complicated or GPU demanding. Basically totally random. As others have pointed out the latest kernel and Mesa do perform much better.

    Leave a comment:


  • xiando
    replied
    [140122.327535] WARNING: CPU: 7 PID: 31433 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:845 dcn10_verify_allow_pstate_ch
    ange_high+0x25/0x240 [amdgpu]
    [.....]
    [140122.328573] RIP: 0010:dcn10_verify_allow_pstate_change_high+0x25/0x240 [amdgpu]
    [140122.328615] Code: 00 00 00 00 00 0f 1f 44 00 00 55 53 48 8b 87 38 01 00 00 48 89 fb 48 8b b8 e0 01 00 00 e8 73 0f 01 00 84 c0 0f 85 16 02 00 0
    0 <0f> 0b 80 bb b9 00 00 00 00 0f 84 07 02 00 00 48 8b 83 38 01 00 00
    [140122.328714] RSP: 0018:ffffa0208c76bb78 EFLAGS: 00010246
    [140122.328745] RAX: 0000000000000000 RBX: ffff91abf9ae7000 RCX: 0000000000000000
    [140122.328785] RDX: 0000000000000000 RSI: ffff91ac0ebd5548 RDI: ffff91ac0ebd5548
    [140122.328825] RBP: ffff91a904480000 R08: 0000000000000000 R09: 0000000000aaaaaa
    [140122.328865] R10: 0000000000000000 R11: ffffa020a1248220 R12: 0000000000000001
    [140122.328904] R13: ffffa0208c76bbc0 R14: ffff91a900094000 R15: 0000000000000000
    [140122.328945] FS: 0000000000000000(0000) GS:ffff91ac0ebc0000(0000) knlGS:0000000000000000
    [140122.328989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [140122.329022] CR2: 00007fc3fd172594 CR3: 00000003ebb16000 CR4: 00000000003406e0
    [140122.329061] Call Trace:
    [140122.329165] dcn10_set_bandwidth+0xad/0xc0 [amdgpu]
    [140122.329271] dc_commit_state+0x420/0x550 [amdgpu]
    [140122.329384] amdgpu_dm_atomic_commit_tail+0x388/0xdb0 [amdgpu]
    [140122.329422] ? __wake_up_common_lock+0x89/0xc0
    [140122.329451] ? _cond_resched+0x15/0x30
    [140122.329475] ? wait_for_completion_timeout+0x3a/0x180
    [140122.329505] ? wait_for_completion_interruptible+0x35/0x1b0
    [140122.329619] ? amdgpu_dm_atomic_commit_tail+0xdb0/0xdb0 [amdgpu]
    [140122.329664] commit_tail+0x3d/0x70 [drm_kms_helper]
    [140122.329702] drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper]
    [140122.329745] restore_fbdev_mode_atomic+0x1c4/0x1e0 [drm_kms_helper]
    [140122.329790] drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
    [140122.329839] drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
    [140122.329880] drm_fb_helper_hotplug_event.part.35+0x90/0xb0 [drm_kms_helper]
    [140122.329927] drm_kms_helper_hotplug_event+0x26/0x30 [drm_kms_helper]
    [140122.330046] handle_hpd_irq+0xd9/0x100 [amdgpu]
    [140122.330156] dm_irq_work_func+0x4e/0x60 [amdgpu]
    [140122.330186] process_one_work+0x19b/0x390
    [140122.330212] worker_thread+0x30/0x370
    [140122.330236] ? rescuer_thread+0x320/0x320
    [140122.330261] kthread+0x112/0x130
    [140122.330282] ? kthread_create_worker_on_cpu+0x70/0x70
    [140122.330313] ret_from_fork+0x22/0x40
    [140122.330337] ---[ end trace 44fa92c7d8e7ba0d ]---


    What did AMD mean by this?

    Leave a comment:


  • tildearrow
    replied
    Originally posted by debianxfce View Post
    I did laugh when I saw GPU reset patches for the intel gpus. The AMD drm-next-4.21-wip kernel started to hang the system when waking up from monitor blanking and sleeping after 30.9.2018 (4.19.rc5->rc6). The gpu reset patch has no effect with RX560 and system must be rebooted with the power button. My distribution has latest wip kernel available with Synaptic.
    That's what happens when you use a WIP (work in progress) kernel, a unstable distro and Ubuntu PPA's on Debian.

    (this gives you the advantage to bisect with further ease and do a bug report though)

    Leave a comment:


  • tildearrow
    replied
    Originally posted by Med_ View Post

    It would be worth a try. How do you alter that with AMDGPU?
    Here is my little shell script that does a simple fan curve:

    Code:
    #!/bin/bash
    cardpath=$(eval echo "/sys/bus/pci/devices/0000:03:00.0/hwmon/hwmon*")
    echo "$cardpath"
    temp=0;
    while true
      do temp=$(cat "$cardpath/temp1_input")
         temp=$((60+((temp/1000)-30)*3))
         echo "$temp" > "$cardpath/pwm1"
         sleep 0.5
      done
    (edit the PCI device location if necessary)

    ​​​​​​​(it won't work well if your card goes under 10°C but this is unlikely)

    Leave a comment:


  • faph
    replied
    Originally posted by Med_ View Post
    It would be worth a try. How do you alter that with AMDGPU?
    There is:
    https://github.com/grmat/amdgpu-fancontrol and
    Application to read current clocks of ATi Radeon cards (xf86-video-ati, xf86-video-amdgpu) - marazmista/radeon-profile

    Leave a comment:


  • tildearrow
    replied
    Originally posted by phoronix View Post
    Have you had any Radeon GPU hangs under Linux recently?
    Not anymore. With one exception, but this problem also happens on Windows: My card sometimes due to AC voltage variance, cosmic fluctuations or lack of ground, randomly self-resets (goes to a "0MHz" power clock, turns the screen off and resets the fan speed). I don't know how to plug it in again, which is most likely impossible.

    Originally posted by phoronix View Post
    Or any particular Radeon Linux driver bugs still biting you?
    I can't encode in H.264... (the encoding slice is now present but is useless)
    Last edited by tildearrow; 28 October 2018, 11:22 AM.

    Leave a comment:


  • shmerl
    replied
    Good. Now dxvk hangs if any won't be so severe as to require reboot.

    Leave a comment:


  • Med_
    replied
    Originally posted by Med_ View Post

    I will try to log temperature to see whether that happens on possible temperature spikes.
    What maximum temperature should I expect to avoid any related problem? I just had a game of XCOM 2. Temperature hovered around 70°C the whole time with a short spike at 80°C.

    Leave a comment:

Working...
X