Setting Up Intel 4th Gen Xeon Scalable "Sapphire Rapids" For Accelerator Use
With Intel's 4th Gen Xeon Scalable "Sapphire Rapids" processors that launched this week, Intel is betting heavily on the integrated accelerators for offering them an advantage over competitors for modern hyperscaler tasks and other workloads able to take advantage of the In-Memory Analytics Accelerator (IAA), Data Streaming Accelerator (DSA), QuickAssist Technology (QAT), and the Dynamic Load Balancer (DLB). But what does the software landscape currently look like and what's needed to actually make use of these accelerators under Linux? Here is a brief how-to guide / overview for making use of the accelerators on your Linux server.
I have been working on some Intel Sapphire Rapids accelerator benchmarks that will hopefully begin publishing in the next week, but given all this new integrated IP into 4th Gen Xeon Scalable and lots of reader questions around it, I figured it would be best to start with an article looking specifically at the software support and setup. As well, as I've only had the Sapphire Rapids server for a week now, I'm still working through all the software intricacies of the accelerator setup and use. Making use of the new accelerators isn't just as easy as say using a new instruction like it is with AVX. Rather, there is the kernel driver as well as a user-space library and related support components. Additionally, beyond just having the user-space library present, the accelerators need to be configured and enabled for use on your system. After getting all of that setup, it's then up to the particular software applications/services and what additional steps may be needed there to actually enjoy the accelerator support for better performance with these new processors.
The Accelerator Engines
As a brief recap from the Sapphire Rapids article earlier this week, the new IP blocks amount to:
Intel QuickAssist Technology (QAT) - QuickAssist Technology has been found within select Intel chipsets and SoCs previously as well as the QuickAssist adapter PCIe add-in cards, but now with Sapphire Rapids is available integrated into the Xeon Scalable CPUs directly. QuickAssist can provide for faster compression and encryption performance over just relying on the CPU cores alone. Given QuickAssist Technology has been around for a while, here there is the best software support and adoption currently and can be easily used by the likes of OpenSSL and other software when activating the QAT engine support.
Intel Dynamic Load Balancer (DLB) - While similar functionality is available via DPUs, the Intel DLB with Sapphire Rapids allows for offloading some tasks around load balancing, queue management, packet prioritization, and other similar functionality. Making use of the DLB requires the DLB kernel driver and DLB poll mode user-space driver. There is also "libdlb" as a library for creating DLB-accelerated software without making use of the Data Plane Development Kit (DPDK) framework. The DLB kernel and user-mode drivers for DLB are currently distributed via Intel.com. The DLB kernel driver is open-source and Intel currently tests it against the 5.15 and 5.19 kernels.
Intel In-Memory Analytics Accelerator (IAA) - In-memory databases, big data analytics, and related software can stand to benefit from this accelerator. This accelerator aims to cut-down on seek times accessing data from storage by moving it closer to the CPU. leveraging a one-dimensional linear structure for columnar data storage, and supports operations around compression/decompression and encryption/decryption of input data. The IAA accelerator also has various filtering operations supported, CRC64, and prefetching address translations.
Making use of the IAA from the software side is done using Intel's Query Processing Library (QPL). The QPL interfaces with the Intel IDXD kernel driver. A compiler of roughly the past two years is also needed for the MOVDIR64B and ENQCMD(S) instruction support. For those making use of the QPL library to tap the IAA accelerator, the library as well has software fallback implementations using AVX2/AVX512 for cases where at run-time the software may not be running on a server with an IAA or lacking access to the IAA.
The IAA in many regards is similar to the DSA accelerator but lacks batch processing, it's a stateless device, no inter-domain features, and other operational feature differences.
Intel Data Streaming Accelerator (DSA) - For data analytics and distributed storage, the DSA accelerators jump into action. The Data Streaming Accelerator is arguably the most exciting and anticipated accelerator block of Sapphire Rapids. The DSA is a high performance data copy and transformation accelerator to/from volatile memory, persistent memory, MMIO, The DSA replaces Intel's existing QuickData Technology.
This overview article is mainly focusing on the software support side for the IAA and DSA accelerators.
How Many Accelerators Do You Have?
Depending upon your 4th Gen Xeon Scalable SKU, you may have only select accelerators, a limited quantity present, or none enabled at all. If you are running a SKU that relies upon the new Intel On Demand for activation of the accelerators, there is a new kernel driver for that -- but not the focus of today's article as I haven't yet encountered hardware with this limitation. Since Linux 5.18 there is the Software Defined Silicon driver as "On Demand" was originally known as... But only with the forthcoming Linux 6.2 kernel is the Intel On Demand driver now ready for action. The SDSi references now reflect the Intel On Demand branding and there are the driver changes around new GUIDs, support for reading On Demand meter certificates, and other changes.
The number of accelerators varies by SKU.
Besides needing to enable the "INTEL_SDSI" Kconfig option to enable the Intel On Demand kernel driver, you need to deal with handling of the On Demand certificates from user-space. The SDSi driver exposes the feature state and certificates via sysfs. The user-space interface is documented via intel-sdsi on GitHub. I don't have any more information to add here on the user-space handling of On Demand feature activation for Linux -- or knowledge of any convenient user-space utility to handle the actual feature purchasing/activation -- as it's also up to the OEMs for negotiating/selling of On Demand activation to customers, etc. We'll see how this all plays out in time.
IDXD Kernel Driver
Intel has been working on the IDXD kernel driver since 2019 as the Data Accelerator Driver for enabling the accelerator support under Linux. This driver was mainlined back for Linux 5.6 but only with Linux 5.18+ does it have software queue support and other features/improvements. So if wanting to use the mainline IDXD driver, it's recommended using Linux 5.18 or newer.
In addition to having this driver built via the "INTEL_IDXD*" Kconfig options, Intel IOMMU must be enabled as well as the Intel IOMMU Scalable Mode support. One issue I ran into this week with the accelerator testing was initially when using accel-config (documented later in this article) I was getting errors when trying to enable workqueues. After digging through Intel documentation, I discovered the notice that INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON and INTEL_IOMMU_DEFAULT_ON should also be enabled as part of the kernel configuration... My testing was using an Ubuntu Mainline Kernel PPA where those default on options are not enabled. But fortunately you don't need to rebuild your kernel: if hitting the accel-config errors or not finding a working system, if your kernel is at least built with INTEL_IOMMU/INTEL_IOMMU_SVM you can boot the kernel with the "intel_iommu=on,sm_on" argument to satisfy the DSA driver requirements around the IOMMU. I expect others may get bitten by this as well as the access-config output wasn't helpful or for example hadn't proactively scanned the Kconfig configuration to simply let the user know that it may be the culprit - I had just realized after digging through documentation.
With the IDXD kernel driver present on your system, the initial indicator of accelerator support can be found by looking at the dmesg output for lines around "Intel(R) Accelerator Device" or just looking at the "idxd" hits.
When encountering a Linux server with accelerator support and the IDXD driver loaded, you should find directories under /sys/bus/dsa/devices/dsa* for each of the accelerator devices.