AMD Looking To Improve The GPU Reset Experience Under Linux
Over the past two weeks have been much discussion among upstream Linux graphics driver developers -- not just AMD but Intel and other developers as well -- over a patch proposed by an AMD engineer to communicate GPU reset events via sysfs. The original idea being to have a sysfs event to indicate to user-space about a GPU reset and providing information such as the process ID involved with the GPU reset event, the GPU status information, and related attributes. This event and emitted information could then be used by a user-space daemon for either quitting/blocking the offending process or ensuring the process is gracefully restarted, logging of said DRM GPU reset events, or other cases of wanting the user-space to be better informed of reset events so corrective actions can be taken to ensure the system is restored back to an appropriate state.
Some developers have expressed opinions that a new DRM-specific sysfs event isn't the best approach but possibly making use of devcoredump. However, with devcoredump isn't limited to just DRM graphics drivers or reset events so further user-space filtering would be needed. There is also a difference of opinion over what details and just how much information should be reported by a reset event. Whether building upon devcoredump or going with a new sysfs event, there is still the open item of actually writing (or otherwise improving existing) user-space software for leveraging the communicated GPU reset event information.
Hopefully you don't experience GPU reset events often when the graphics card hits an awry state and needs to be reset, but at least if you do, there is work underway on reporting the troublesome event up to user-space so the user can be better informed.
The discussion over the AMD-proposed GPU reset event reporting additions/improvements is happening via this dri-devel thread. It will be interesting to see how the discussion pans out for ultimately working to improve the GPU reset reporting/handling experience under Linux.