AMDGPU Reset Recovery To Be Flipped On By Default For Newer Radeon GPUs
With the amdgpu-drm-next code for what will eventually be either Linux 4.21 or Linux 5.1, that's being changed where by default it's on for GFX8/GFX9 GPUs. That GPU reset recovery code in the mainline Linux driver has been found to be "working for the most part" on Topaz, Tonga, Fiji, Polaris, and Vega graphics processors, according to this commit added on Friday to the AMDGPU DRM driver's testing branch. Previously it was just on by default for GPU SR-IOV setups. The recovery is triggered after a ten second job timeout.
At least until if/when that GPU reset recovery code is improved upon for the older AMD GCN GPUs, there is still the gpu_recovery module parameter whereby amdgpu.gpu_recovery=1 will enable the GPU recovery path unconditionally. But hopefully your Radeon GPU support is stable enough these days on the open-source driver to not have to worry about such recovery in the first place... It actually seems now like it's been a few kernel releases since I last had a Radeon GPU hang under Linux regardless of the GPU model (sans Raven), so for the most part hopefully this change won't end up being noticeable to end-users.
Have you had any Radeon GPU hangs under Linux recently? Or any particular Radeon Linux driver bugs still biting you? Let us know in the forums.