"GMEM" Proposed To Deal With Memory Management For Accelerators, External Memory Devices

Written by Michael Larabel in Hardware on 29 November 2023 at 12:00 AM EST. 3 Comments

Generalized Memory Management "GMEM" has been proposed as a new solution to be developed for the Linux kernel to deal with memory management for external memory devices like the growing number of accelerators coming to market.

Huawei engineer Weixi Zhu announced their work on Tuesday around GMEM with hoping to avoid all the code duplication and redundant work being done currently when enabling new hardware/drivers for handling memory management of such external memory hardware. The GMEM proposal sums up the current issue/challenge rather well:

"Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory.

These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems."

With the proposed GMEM code, the Linux memory management "MM" subsystem is extended to share its machine-independent code while providing only a high-level interface for device drivers. In turn GMEM should allow more re-use by drivers without reinventing the wheel. GMEM has been tested with Huawei's neural processing unit device driver. Turning to GMEM allowed the Huawei NPU driver alone to cut down on 26k lines of code. There are other benefits as well as laid out in the GMEM proposal:

"Using GMEM-based driver, it is possible to write a C-style accelerator code with malloc(), whose underlying mmap() syscall should include MAP_PEER_SHARED according to current GMEM implementation. Importantly, GMEM guarantees a coherent view of memory between the host and all attached devices. This means that any data written by the CPU or any attached accelerator can be seen by the next memory load instruction issued by any attached accelerator or the CPU. Furthermore, the NPU device was able to oversubscribe memory by swapping memory to host DDR. Note that this memory oversubscription mechanism can be universal if the physical memory management is provided by GMEM. Other potential use cases of GMEM could include the IOMMU driver, KVM and RDMA drivers, as long as the device needs to manage external memory resources like VMAs, MMUs or local DRAMs."

The GMEM proposal can be found in full on dri-devel while it awaits review and feedback from other Linux device driver stakeholders.

3 Comments