NVGRACE-GPU VFIO Driver Preparing For NVIDIA Grace Blackwell

Written by Michael Larabel in NVIDIA on 7 October 2024 at 06:39 AM EDT. Add A Comment
NVIDIA
The NVGRACE-GPU VFIO driver was introduced for handling Virtual Function I/O support with the NVIDIA Grace Hopper Superchip so that the GPU device could be assigned to guests using KVM/QEMU and similar for virtualization. The NVGRACE-GPU driver is now being extended for supporting the forthcoming NVIDIA Grace Blackwell "GB" designs.

Posted on Sunday were a set of patches for extending the NVGRACE-GPU VFIO driver for Grace Blackwell. This work is necessary so that the Blackwell GPU can play nicely within all the very common virtualized environments these days.

NVIDIA Blackwell GPU


NVIDIA engineer Ankit Agrawal explained of the driver changes for accommodating Grace Blackwell VFIO support;
"NVIDIA's recently introduced Grace Blackwell (GB) Superchip in continuation with the Grace Hopper (GH) superchip that provides a cache coherent access to CPU and GPU to each other's memory with an internal proprietary chip-to-chip (C2C) cache coherent interconnect. The in-tree nvgrace-gpu driver manages the GH devices. The intention is to extend the support to the new Grace Blackwell boards.

There is a HW defect on GH to support the Multi-Instance GPU (MIG) feature [1] that necessiated the presence of a 1G carved out from the device memory and mapped uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3) to workaround the issue.

The GB systems differ from GH systems in the following aspects.
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the GPUdirect RDMA feature.

This patch series accommodate those GB changes by showing the real physical device BAR1 (region2 and 3) to the VM instead of the fake one. This takes care of both the differences."

These patches are now out for review on the Linux kernel mailing list.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week