Linux 6.11 Bringing "Hardware Replay" Feature For Intel Graphics Debugging
The main set of drm-intel-gt-next patches aiming for the Linux 6.11 kernel were submitted this week to DRM-Next. Most notable with this feature update for the next kernel version is enabling a new hardware replay feature for better reproducing GPU hangs.
The Linux 6.11 kernel for the Intel kernel graphics driver is set to add a new user-space API for uploading custom context state for replaying GPU hang error state captures. This hardware replay feature is to be used by the Intel Mesa driver for making use of the new user-space API for submitting the state to be replayed after the fact on Intel graphics hardware. This merge has the Mesa-side support for the hardware hang replay feature.
This should ease the process for Intel driver developers to reproduce GPU hangs experienced on real hardware where as up to this point they were only able to accomplish this using simulations. The prior simulated approach had limits to reproducing hardware hangs and this new solution should work out better. However, as this feature is only intended for Intel/Mesa developers, the new user-space API is hidden behind a Kconfig option and run-time enablement switches. The Kconfig option is DRM_I915_REPLAY_GPU_HANGS_API and the "i915.enable_debug_only_api" module option is needed to be enabled at run-time. This will hopefully lead to a more bug-free experience for Intel Linux customers with developers being able to better replay and reproduce hardware hangs moving forward.
This week's drm-intel-gt-next pull request also has Meteor Lake hang fixes, other DG2 and Meteor Lake / Arrow Lake fixes, and other code updates. Look for all of this new code to premiere in Linux 6.11 with its merge window opening up in mid-July.
The Linux 6.11 kernel for the Intel kernel graphics driver is set to add a new user-space API for uploading custom context state for replaying GPU hang error state captures. This hardware replay feature is to be used by the Intel Mesa driver for making use of the new user-space API for submitting the state to be replayed after the fact on Intel graphics hardware. This merge has the Mesa-side support for the hardware hang replay feature.
This should ease the process for Intel driver developers to reproduce GPU hangs experienced on real hardware where as up to this point they were only able to accomplish this using simulations. The prior simulated approach had limits to reproducing hardware hangs and this new solution should work out better. However, as this feature is only intended for Intel/Mesa developers, the new user-space API is hidden behind a Kconfig option and run-time enablement switches. The Kconfig option is DRM_I915_REPLAY_GPU_HANGS_API and the "i915.enable_debug_only_api" module option is needed to be enabled at run-time. This will hopefully lead to a more bug-free experience for Intel Linux customers with developers being able to better replay and reproduce hardware hangs moving forward.
This week's drm-intel-gt-next pull request also has Meteor Lake hang fixes, other DG2 and Meteor Lake / Arrow Lake fixes, and other code updates. Look for all of this new code to premiere in Linux 6.11 with its merge window opening up in mid-July.
6 Comments