Habana Labs' Gaudi NIC Support Being Worked On For Linux Kernel

Written by Michael Larabel in Intel on 13 September 2020 at 09:41 AM EDT. Add A Comment
INTEL
Intel-owned AI startup Habana Labs is working on expanding their "Gaudi" support to now include the NIC network interface found on this AI training accelerator hardware.

Back for Linux 5.8 there was Gaudi support added to the Habana Labs accelerator driver. Previously the Hababa Labs open-source Linux driver only supported their Goya AI inference accelerator but now with the latest stable Linux kernel release there is support for the Gaudi AI training accelerator.

One important missing piece though with the current driver support was lacking NIC support for scaling out in connecting multiple accelerators. But now in patch form there is Gaudi NIC support and it could be mainlined for Linux 5.10.

The 15 patches allow for the NIC support to handle scale-out interconnect for distributed deep learning training. As many as "tens of thousands" of Gaudi accelerators can be connected using RDMA-over-converged-Ethernet for this distributed deep learning training.

Upstream driver maintainer Oded Gabbay of Habana Labs explained, "Each GAUDI exposes 10x100GbE ports that are designed to scale-out the inter-GAUDI communication by integrating a complete communication engine on-die. This native integration allows users to use the same scaling technology, both inside the server and rack (termed as scale-up), as well as for scaling across racks (scale-out). The racks can be connected directly between GAUDI processors, or through any number of standard Ethernet switches. The driver exposes the NIC ports to the user as standard Ethernet ports by registering each port to the networking subsystem. This allows the user to manage the ports with standard tools such as ifconfig, ethtool, etc. It also enables us to connect to the Linux networking stack and thus support standard networking protocols, such as IPv4, IPv6, TCP, etc. In addition, we can also leverage protocols such as DCB for dynamically configuring priorities to avoid congestion."

Great seeing all of the Habana Labs open-source support work continue.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week