Linux 6.6 To Deal With Unresponsive Intel QAT Devices
Linux has supported Quick Assist Technology (QAT) devices from the start whether it be QAT PCIe adapters or QAT support found within select Atom and Xeon CPUs as well as the latest-generation Sapphire Rapids CPUs. Only now though with the upcoming Linux 6.6 kernel is it adding a heartbeat feature for determining if a QAT device becomes unresponsive so that it can be acted upon.
Intel Quick Assist Technology can be used for accelerating data encryption and compression. Building upon the existing Intel QAT Linux kernel driver, a heartbeat feature is being added to deal with potentially unresponsive QAT devices. This patch explains:
Plus related QAT updates have been queued into the "cryptodev" Git tree ahead of the Linux 6.6 cycle at the end of summer.
This QAT heartbeat feature is a useful addition but a bit surprising such functionality wasn't already in place.
Intel Quick Assist Technology can be used for accelerating data encryption and compression. Building upon the existing Intel QAT Linux kernel driver, a heartbeat feature is being added to deal with potentially unresponsive QAT devices. This patch explains:
Under some circumstances, firmware in the QAT devices could become unresponsive. The Heartbeat feature provides a mechanism to detect unresponsive devices.
The QAT FW periodically writes to memory a set of counters that allow to detect the liveness of a device. This patch adds logic to enable the reporting of those counters, analyze them and report if a device is alive or not.
In particular this adds
(1) heartbeat enabling, reading and detection logic
(2) reporting of heartbeat status and configuration via debugfs
(3) documentation for the newly created sysfs entries
(4) configuration of FW settings related to heartbeat, e.g. tick period
(5) logic to convert time in ms (provided by the user) to clock ticks
Plus related QAT updates have been queued into the "cryptodev" Git tree ahead of the Linux 6.6 cycle at the end of summer.
This QAT heartbeat feature is a useful addition but a bit surprising such functionality wasn't already in place.
7 Comments