AMD Looking To Improve The GPU Reset Experience Under Linux

Written by Michael Larabel in Radeon on 17 March 2022 at 07:03 AM EDT. 19 Comments
RADEON
AMD's Radeon Linux graphics driver developers are looking at enhancing the GPU reset experience so more information about the troublesome event can be communicated up the stack for better informing the user and/or taking greater action to ensure the desktop gets successfully restored.

Over the past two weeks have been much discussion among upstream Linux graphics driver developers -- not just AMD but Intel and other developers as well -- over a patch proposed by an AMD engineer to communicate GPU reset events via sysfs. The original idea being to have a sysfs event to indicate to user-space about a GPU reset and providing information such as the process ID involved with the GPU reset event, the GPU status information, and related attributes. This event and emitted information could then be used by a user-space daemon for either quitting/blocking the offending process or ensuring the process is gracefully restarted, logging of said DRM GPU reset events, or other cases of wanting the user-space to be better informed of reset events so corrective actions can be taken to ensure the system is restored back to an appropriate state.

Some developers have expressed opinions that a new DRM-specific sysfs event isn't the best approach but possibly making use of devcoredump. However, with devcoredump isn't limited to just DRM graphics drivers or reset events so further user-space filtering would be needed. There is also a difference of opinion over what details and just how much information should be reported by a reset event. Whether building upon devcoredump or going with a new sysfs event, there is still the open item of actually writing (or otherwise improving existing) user-space software for leveraging the communicated GPU reset event information.


Hopefully you don't experience GPU reset events often when the graphics card hits an awry state and needs to be reset, but at least if you do, there is work underway on reporting the troublesome event up to user-space so the user can be better informed.


The discussion over the AMD-proposed GPU reset event reporting additions/improvements is happening via this dri-devel thread. It will be interesting to see how the discussion pans out for ultimately working to improve the GPU reset reporting/handling experience under Linux.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week