Habana Labs For Linux 5.16 To Bring Peer-To-Peer Support With DMA-BUF
The drama around DMA-BUF code for the Habana Labs AI driver appears to be wrapping up with the Linux 5.16 cycle that is coming up.
The Habana Labs driver changes were mailed out today for queuing in char/misc ahead of next month's Linux 5.16 merge window. This driver for supporting the AI inference and training accelerators at the Intel-owned company has some exciting updates for this next kernel release.
This summer tried for Linux 5.15 was DMA-BUF export support for the driver but the DMA-BUF changes were strongly objected to by the Direct Rendering Manager developers upstream. The DMA-BUF issue ultimately stems from the Habana Labs kernel driver being open-source but not having any open-source user-space client to stress and utilize the interface as is a requirement for DRM GPU drivers.
This issue was ultimately addressed in September with Habana Labs opening up the code to their AI compiler and SynapseAI Core so there is at least some form of a working, open-source "client" in user-space.
Now having that open-source user-space code, the planned Habana Labs driver changes for Linux 5.16 include the new user-space API for the driver to export a DMA-BUF object that is a memory region on the accelerator's DRAM.
This DMA-BUF support in the Habana Labs AI driver is necessary for supporting peer-to-peer sharing over PCI Express with the design intention of sharing buffers directly between the Gaudi training accelerator and RDMA adapters. Currently the Mellanox mlnx5 and AWS Elastic Fabric Adapter (efa) drivers should be ready with DMA-BUF peer-to-peer sharing from the Habana Labs AI driver.
This driver for Linux 5.16 also is now exposing more power information obtained from the hardware's firmware and there are several other fixes and improvements.
The list of planned Habana Labs driver improvements for Linux 5.16 can be found via this pull request.
The Habana Labs driver changes were mailed out today for queuing in char/misc ahead of next month's Linux 5.16 merge window. This driver for supporting the AI inference and training accelerators at the Intel-owned company has some exciting updates for this next kernel release.
This summer tried for Linux 5.15 was DMA-BUF export support for the driver but the DMA-BUF changes were strongly objected to by the Direct Rendering Manager developers upstream. The DMA-BUF issue ultimately stems from the Habana Labs kernel driver being open-source but not having any open-source user-space client to stress and utilize the interface as is a requirement for DRM GPU drivers.
This issue was ultimately addressed in September with Habana Labs opening up the code to their AI compiler and SynapseAI Core so there is at least some form of a working, open-source "client" in user-space.
Now having that open-source user-space code, the planned Habana Labs driver changes for Linux 5.16 include the new user-space API for the driver to export a DMA-BUF object that is a memory region on the accelerator's DRAM.
This DMA-BUF support in the Habana Labs AI driver is necessary for supporting peer-to-peer sharing over PCI Express with the design intention of sharing buffers directly between the Gaudi training accelerator and RDMA adapters. Currently the Mellanox mlnx5 and AWS Elastic Fabric Adapter (efa) drivers should be ready with DMA-BUF peer-to-peer sharing from the Habana Labs AI driver.
This driver for Linux 5.16 also is now exposing more power information obtained from the hardware's firmware and there are several other fixes and improvements.
The list of planned Habana Labs driver improvements for Linux 5.16 can be found via this pull request.
1 Comment