Intel Continues Working On IAA Crypto Compression Driver For Linux
Introduced with 4th Gen Xeon Scalable "Sapphire Rapids" processors are various new accelerators available on select SKUs or via the Intel On Demand offering. One of the initial challenges there though is the early accelerator software support limitations and many upstream open-source (or even just widespread) software not yet enabled to make use of these new accelerators. One of the improvements on that front has been Intel engineers working on an IAA crypto compression driver for the kernel so that the In-Memory Analytics Accelerator can be transparently accessible to kernel features making use of the crypto API.
The past several months has seen the IAA Crypto Compression Driver for the Linux kernel go through a half-dozen revisions so far as it works its way toward mainline. This new driver makes the Intel IAA accelerator available via the kernel crypto API and in turn can be used by kernel code targeting that API, such as Zswap and zRAM. The driver provides both sync and async versions of the DEFLATE algorithm implemented by the hardware.
While this driver will open-up kernel use-cases for the IAA accelerator, the driver patch notes do acknowledge the initial headaches around setting up the Sapphire Rapids accelerators, i.e. it still won't be out-of-the-box when booting a capable Linux software stack:
But when everything is setup and going with this proposed driver, the performance results are quite dramatic with IAA usage compared to pure software:
Earlier this month a v6 patch series for this kernel driver was sent out for review. Though given the timing and not yet being picked up by the cryptodev.git branch, it's unlikely this driver will be ready in time for the upcoming Linux v6.5 cycle. Another obstacle was potentially raised last week by Linux crypto subsystem maintainer Herbert Xu:
This difference in Intel's deflate implementation does appear to be genuine. ClickHouse developers previously warned in their Intel QPL support that if wanting to move Intel IAA-accelerated databases between hosts, you would first need to convert all of the data before taking it off the server. In which case if this driver is to go mainlined, Intel will need to provide a software implementation too for the kernel.
The past several months has seen the IAA Crypto Compression Driver for the Linux kernel go through a half-dozen revisions so far as it works its way toward mainline. This new driver makes the Intel IAA accelerator available via the kernel crypto API and in turn can be used by kernel code targeting that API, such as Zswap and zRAM. The driver provides both sync and async versions of the DEFLATE algorithm implemented by the hardware.
While this driver will open-up kernel use-cases for the IAA accelerator, the driver patch notes do acknowledge the initial headaches around setting up the Sapphire Rapids accelerators, i.e. it still won't be out-of-the-box when booting a capable Linux software stack:
"The IAA hardware is fairly complex and generally requires a knowledgeable administrator with sufficiently detailed understanding of the hardware to set it up before it can be used. As mentioned in the Documentation, this typically requires using a special tool called accel-config to enumerate and configure IAA workqueues, engines, etc, although this can also be done using only sysfs files.
The operation of the driver mirrors this requirement and only allows the hardware to be accessed via the crypto layer once the hardware has been configured and bound to the the IAA crypto driver. As an IDXD sub-driver, the IAA crypto driver essentially takes ownership of the hardware until it is given up explicitly by the administrator. This occurs automatically when the administrator enables the first IAA workqueue or disables the last one; the iaa_crypto (sync and async) algorithms are registered when the first workqueue is enabled, and deregistered when the last one is disabled.
The normal sequence of operations would normally be:
configure the hardware using accel-config or sysfs
configure the iaa crypto driver (see below)
configure the subsystem e.g. zswap/zram to use the iaa_crypto algo
run the workload"
But when everything is setup and going with this proposed driver, the performance results are quite dramatic with IAA usage compared to pure software:
Earlier this month a v6 patch series for this kernel driver was sent out for review. Though given the timing and not yet being picked up by the cryptodev.git branch, it's unlikely this driver will be ready in time for the upcoming Linux v6.5 cycle. Another obstacle was potentially raised last week by Linux crypto subsystem maintainer Herbert Xu:
So you said that canned is not compatible with the generic deflate algorithm. Does that mean that there is no way for it to decompress something compressed by the generic deflate algorithm, and vice versa its compressed output cannot be decompressed by generic deflate?
We don't add an algorithm to the Crypto API if the only implementation is by hardware. IOW if you are adding a new algorithm, then a software version must be the first patch.
This difference in Intel's deflate implementation does appear to be genuine. ClickHouse developers previously warned in their Intel QPL support that if wanting to move Intel IAA-accelerated databases between hosts, you would first need to convert all of the data before taking it off the server. In which case if this driver is to go mainlined, Intel will need to provide a software implementation too for the kernel.
3 Comments