Device Memory TCP Nears The Finish Line For More Efficient Networked Accelerators
A year ago Google engineers posted experimental Linux code for Device Memory TCP for more efficient transferring of data from GPUs/accelerators to network devices without having to go through a host CPU memory buffer. After going through many rounds of review, Device Memory TCP appears to be nearing the finish line.
Device Memory TCP "Devmem TCP" is a Linux kernel feature being baked to allow transferring data to and/or from device memory efficiently without having to bounce the data through a host memory buffer. Due to high memory/network bandwidth requirements particularly for AI training with many interconnected systems and relying on TPUs / GPUs / NPUs / other accelerator device types in general, the goal has been to avoid memory copies through the host system memory when sending or receiving data from those discrete devices across the network.
Device Memory TCP brings socket APIs to allow sending device memory across the network directly and to receive incoming network packets directly into device memory. This both helps with avoiding host memory bandwidth pressure as well as reducing PCI Express bandwidth pressure in not having to go through the PCIe root complex.
It appears Device Memory TCP is wrapping up with the prep patches having been queued last week into the networking subsystem's "net-next.git" tree. Thus the prep patches for Device Memory TCP at least will land for Linux 6.11. There's still a week or two to see if the Device Memory TCP work itself will be queued up too into net-next ahead of the v6.11 merge window, otherwise it's looking like it will arrive for v6.12 which should be exciting given that it will likely be the 2024 LTS kernel version.
Device Memory TCP "Devmem TCP" is a Linux kernel feature being baked to allow transferring data to and/or from device memory efficiently without having to bounce the data through a host memory buffer. Due to high memory/network bandwidth requirements particularly for AI training with many interconnected systems and relying on TPUs / GPUs / NPUs / other accelerator device types in general, the goal has been to avoid memory copies through the host system memory when sending or receiving data from those discrete devices across the network.
Device Memory TCP brings socket APIs to allow sending device memory across the network directly and to receive incoming network packets directly into device memory. This both helps with avoiding host memory bandwidth pressure as well as reducing PCI Express bandwidth pressure in not having to go through the PCIe root complex.
It appears Device Memory TCP is wrapping up with the prep patches having been queued last week into the networking subsystem's "net-next.git" tree. Thus the prep patches for Device Memory TCP at least will land for Linux 6.11. There's still a week or two to see if the Device Memory TCP work itself will be queued up too into net-next ahead of the v6.11 merge window, otherwise it's looking like it will arrive for v6.12 which should be exciting given that it will likely be the 2024 LTS kernel version.
3 Comments