AMD Posts Latest Linux Patches For Supporting The Frontier Supercomputer
Much of the Frontier bring-up for the Linux kernel over the past number of months has been around supporting the coherent interconnect between AMD EPYC CPUs and the Instinct "Aldebaran" GPUs/accelerators with allowing CPUs coherent access to the GPU memory. The latest patch series out today for the Linux kernel is again focused on this GPU device memory handling.
The new patches introduce the notion of "MEMORY_DEVICE_PUBLIC" as memory that is mapped for CPU access but within the GPU device memory and supporting migration to/from those areas. Back in May we wrote about the earlier work on that coherent GPU device memory support with migration support.
The 14 "v1" patches introduce the MEMORY_DEVICE_PUBLIC type and plumbed into the kernel's memory management code and the necessary changes to the AMDKFD kernel driver. This MEMORY_DEVICE_PUBLIC code works with their recently-merged heterogeneous memory management (HMM) support and Shared Virtual Memory (SVM) capability recently added to AMDKFD.
Long story short, the Linux kernel bring-up for Frontier remains ongoing. The Frontier ORNL site is still showing a 2021 delivery for this first US exascale supercomputer.