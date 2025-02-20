The upcoming Linux 6.15 kernel is set to finally introduce a standardized way of informing user-space of GPUs becoming hung or otherwise unresponsive. This is initially wired up for AMD and Intel graphics drivers on Linux so the user can be properly notified of problems and/or user-space software taking steps to address the hung/unresponsive graphics processor.With the work started by Intel graphics driver engineers for their Xe and i915 Direct Rendering Manager drivers, a new device wedged event is set to be added to Linux 6.15 for reporting unresponsive hardware to user-space via a uevent. The AMDGPU driver is also adapted to make use of this device wedged event while with time other non-Intel/AMD Linux GPU drivers will likely adopt this event interface too.This work notifies user-space of a hung/unusable hardware state and can be useful if the driver already has attempted a GPU reset on its own in an attempt to correct the hardware state. The hope is this will be a generic way for helping to recover from hung GPUs with user-space intervention. Besides alerting user-space of the problem itself, via udev rules or other custom recovery scripts, steps could be taken when informed of the hung/unresponsive GPU.

Recovery methods could include unbinding and rebinding the kernel driver, unbinding and rebinding the driver with resetting the bus device after the driver unbind, or other steps and/or no action. This is useful for situations where the kernel driver itself can't address the problematic GPU hardware state on its own due to being unable to unload/reload the driver itself or needing to take other steps to correct the hardware state. An example GPU recovery script is being added to the new documentation: