Intel Making It Easier To Reproduce Linux GPU Hangs On Real Hardware
Intel engineers working on their open-source Mesa OpenGL/Vulkan driver code currently replay captured error state / GPU hangs using a simulator, but a new patch proposal allows for replaying GPU hangs on the actual hardware. In turn this will hopefully help Intel driver developers better address some real-world issues.
A patch was sent out this week to allow replaying GPU hangs using captured context images on actual Intel GPU hardware rather than just their simulator. The i915 kernel driver patch adds a new "DRM_I915_REPLAY_GPU_HANGS_API" Kconfig option to allow replaying GPU hangs with a new I915_CONTEXT_PARAM_CONTEXT_IMAGE interface to allow uploading the captured context image into the driver state prior to executing the hanging batch buffers.
This proposal treats the user-space API as a debug-only interface and is thus hidden behind this kernel build option and also requires setting the "i915.enable_debug_only_api" module parameter.
The kernel patch allowing for replaying GPU hangs on actual Intel graphics hardware is currently under review on dri-devel. There is also a Mesa merge request to allow using the proposed user-space API for hardware replay.
This feature is just for Intel graphics driver developers but hopefully it will help out in reproducing and addressing issues that only turn up on actual Intel iGPU/dGPU hardware that cannot be reproduced or as easily within a simulator environment.
A patch was sent out this week to allow replaying GPU hangs using captured context images on actual Intel GPU hardware rather than just their simulator. The i915 kernel driver patch adds a new "DRM_I915_REPLAY_GPU_HANGS_API" Kconfig option to allow replaying GPU hangs with a new I915_CONTEXT_PARAM_CONTEXT_IMAGE interface to allow uploading the captured context image into the driver state prior to executing the hanging batch buffers.
This proposal treats the user-space API as a debug-only interface and is thus hidden behind this kernel build option and also requires setting the "i915.enable_debug_only_api" module parameter.
The kernel patch allowing for replaying GPU hangs on actual Intel graphics hardware is currently under review on dri-devel. There is also a Mesa merge request to allow using the proposed user-space API for hardware replay.
This feature is just for Intel graphics driver developers but hopefully it will help out in reproducing and addressing issues that only turn up on actual Intel iGPU/dGPU hardware that cannot be reproduced or as easily within a simulator environment.
Add A Comment