Facebook Still Pursuing "NetGPU" - Working On AMD GPU Support In Addition To NVIDIA

Written by Michael Larabel in Linux Networking on 29 August 2020 at 01:53 PM EDT. 19 Comments
It was the recent Facebook patches for implementing NetGPU that with one of the NVIDIA-focused patches led to the recent controversy around "GPL condoms" in the kernel and ultimately leading to new protections with Linux 5.9. That NetGPU code is still being worked on by Facebook with upstream hopes but now in addition to the NVIDIA driver support they are also working on AMD GPU support with the open-source driver.

NetGPU as a reminder is the Facebook work-in-progress code for supporting zero-copy DMA transfers between the network adapter and graphics processor. This RDMA alternative still leads to protocol processing on the host CPU but would allow for much faster data processing on the GPU thanks to the zero-copy direct memory access between the NIC and GPU. Facebook is looking to make use of NetGPU for their machine learning clusters with plans to use 200 Gbps NICs and GPUs attached to a PCI Express switch. The CPU alone can't handle the dataset traffic for their intense machine learning workloads, but NetGPU should make their design feasible.

NetGPU itself is quite interesting and will hopefully make it into the mainline Linux kernel. It's just that the dependence on the NVIDIA proprietary driver for GPU usage with the previously proposed patches and the driver shim is what caused controversy.

The good news is that AMD GPU support for NetGPU is a work in progress. Unfortunately though the Radeon Open eCosystem (ROCm) stack in its current form isn't sufficient. Some changes to the ROCm code are currently being looked at due to DMA-BUF support currently not being exposed by their thunk driver.

Having the AMD GPU support working off an open-source compute stack will also clear an obstacle for NetGPU getting review and approval from other upstream kernel developers rather than being contingent upon the NVIDIA proprietary driver.

More details on NetGPU via this slide deck by Facebook engineer Jonathan Lemon.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week