Intel IDXD Driver To Better Handle Accelerators In Event Of Hardware Errors

Written by Michael Larabel in Intel on 5 July 2024 at 03:31 PM EDT. Add A Comment
INTEL
Intel's IDXD driver is what enables the Data Streaming Accelerator (DSA) under Linux as found since Sapphire Rapids as part of Intel's accelerator offerings on their Xeon processors. With patches posted today, the IDXD driver will help the hardware recover in case of errors to provide a more robust experience.

Patches posted today on the Linux kernel mailing list enable the Intel IDXD driver to perform a PCIe Function Level Reset (FLR) when the Data Streaming Accelerator(s) hit a hardware error. The FLR reset allows for more robust recovery compared to the status quo of just printing an error when such a problem occurs.

Intel DSA accelerators


The "enable FLR for IDXD halt" patch series explains:
"When IDXD device hits hardware errors, it enters halt state and triggers an interrupt to IDXD driver. Currently IDXD driver just prints an error message in the interrupt handler.

A better way to handle the interrupt is to do Function Level Reset (FLR) and recover the device's hardware and software configurations to its previous working state. The device and software can continue to run after the interrupt.

This series enables this FLR handling for IDXD device whose WQs are all user type. FLR handling for IDXD device whose WQs are kernel type will be implemented in a future series."

These IDXD patches are now under review and will hopefully be picked up for a forthcoming kernel series... With the Linux v6.11 merge window just a week or two away, it remains to be seen if these patches will be deemed ready by then or will be pushed off to a later kernel version.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week