2021 Could Be The Year That AMD Radeon Graphics Can Hot Unplug Gracefully On Linux
It's been nearly one year that AMDGPU patches have been around to better handle GPU hot unplugging on Linux. The use-case for that being either removal via sysfs such as if then assigning the GPU to a VM or for external GPUs such as connected via Thunderbolt. Those patches are still baking but the latest iteration of the work has now been published by AMD.
Currently the hot removal of AMD Radeon GPUs under Linux can result in a kernel oops or system hangs or application hangs, among related headaches. Reportedly, Windows doesn't handle the GPU hot-unplug situation much better.
But with GPU hot unplugging becoming more common for cases like external Thunderbolt-connected GPUs, AMD engineers have been working to make their Linux kernel driver better behave in such scenarios.
Andrey Grodzovsky of AMD has this week published the fourth iteration of the kernel patches. The "v4" patches contain additional protections to better the code for its eventual mainlining as well as re-basing the work against the latest drm-misc-next state.
Grodzovsky noted with the patches, "With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses."
There are some known issues remaining such as when going to re-attach the GPU after the disconnect event will lead to hardware errors, plus other items still pending.
Those with external Radeon GPUs or similar use-cases interested in trying out the latest patches can find them via amd-gfx.
Currently the hot removal of AMD Radeon GPUs under Linux can result in a kernel oops or system hangs or application hangs, among related headaches. Reportedly, Windows doesn't handle the GPU hot-unplug situation much better.
But with GPU hot unplugging becoming more common for cases like external Thunderbolt-connected GPUs, AMD engineers have been working to make their Linux kernel driver better behave in such scenarios.
Andrey Grodzovsky of AMD has this week published the fourth iteration of the kernel patches. The "v4" patches contain additional protections to better the code for its eventual mainlining as well as re-basing the work against the latest drm-misc-next state.
Grodzovsky noted with the patches, "With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses."
There are some known issues remaining such as when going to re-attach the GPU after the disconnect event will lead to hardware errors, plus other items still pending.
Those with external Radeon GPUs or similar use-cases interested in trying out the latest patches can find them via amd-gfx.
24 Comments