Intel Working To Improve The Reset Experience During GPU Hangs
Driven to improve the Chrome OS user-experience, Intel open-source developers have been working on improving their GPU reset behavior when encountering problems under 3D/multimedia workloads.
Carlos Santa of Intel is presenting their latest work on a low-latency GPU engine-based reset mechanism. The current behavior is that the UI freezes followed by a black screen and system reboot, which can happen after unexpected GPU behavior after hours of usage.
Under the current design, a full GPU reset happens where as the approach being pursued is being able to reset just the particular engine that's hung. That full GPU reset generally interrupts the user experience and what can result in the black screen and/or system reboot.
This per-engine resetting relies upon timeout detection and recovery for resetting engines independently by having the UMD media driver utilize a watchdog timer when sending batch buffers. The GPU driver in turn is only resetting the affected engines/blocks after the timeout occurs.
This "TDR" approach is what we wrote about months ago but is still working its way to the mainline kernel. The watchdog components, GuC integration, and other bits are still pending, but hopefully we'll see it settled in the months ahead.
More details in this slide deck (PDF).
Carlos Santa of Intel is presenting their latest work on a low-latency GPU engine-based reset mechanism. The current behavior is that the UI freezes followed by a black screen and system reboot, which can happen after unexpected GPU behavior after hours of usage.
Under the current design, a full GPU reset happens where as the approach being pursued is being able to reset just the particular engine that's hung. That full GPU reset generally interrupts the user experience and what can result in the black screen and/or system reboot.
This per-engine resetting relies upon timeout detection and recovery for resetting engines independently by having the UMD media driver utilize a watchdog timer when sending batch buffers. The GPU driver in turn is only resetting the affected engines/blocks after the timeout occurs.
This "TDR" approach is what we wrote about months ago but is still working its way to the mainline kernel. The watchdog components, GuC integration, and other bits are still pending, but hopefully we'll see it settled in the months ahead.
More details in this slide deck (PDF).
13 Comments