AMD Kernel Driver Enabling Peer-To-Peer Multi-GPU Compute For Linux
A new patch series posted today by AMD is enabling peer-to-peer support within their AMDKFD kernel compute driver for allowing communication between multiple AMD GPUs over the PCIe bus without needing intermediate copies through system memory. In turn this should help with the multi-GPU compute performance for the Radeon ROCm stack.
The set of patches to the AMDKFD driver and toggled at build-time via the proposed "HSA_AMD_P2P" Kconfig switch enables GPUs to directly access the GPU video memory of other graphics cards without needing to go through system RAM. This AMDKFD feature works for compatible chipsets and where the BAR is large enough to expose the entire video memory capacity on the PCIe bus.
This P2P work also includes extending the KFD (Kernel Fusion Driver) device topology to surface peer-to-peer links and exposing the layout to user-space via sysfs. Previously the Radeon compute libraries in user-space attempted their own peer-to-peer handling while this integration from the kernel driver side should be superior in terms of robustness and reliability.
Given the timing of these new patches, this HSA_AMD_P2P work won't be merged until at least the Linux 5.20 cycle later this summer. This P2P multi-GPU work is just in the context of their AMDKFD compute driver code and not anything targeting their graphics side/APIs that is handled separately and already supported to varying degrees.
The set of patches to the AMDKFD driver and toggled at build-time via the proposed "HSA_AMD_P2P" Kconfig switch enables GPUs to directly access the GPU video memory of other graphics cards without needing to go through system RAM. This AMDKFD feature works for compatible chipsets and where the BAR is large enough to expose the entire video memory capacity on the PCIe bus.
This P2P work also includes extending the KFD (Kernel Fusion Driver) device topology to surface peer-to-peer links and exposing the layout to user-space via sysfs. Previously the Radeon compute libraries in user-space attempted their own peer-to-peer handling while this integration from the kernel driver side should be superior in terms of robustness and reliability.
Given the timing of these new patches, this HSA_AMD_P2P work won't be merged until at least the Linux 5.20 cycle later this summer. This P2P multi-GPU work is just in the context of their AMDKFD compute driver code and not anything targeting their graphics side/APIs that is handled separately and already supported to varying degrees.
10 Comments