AMD FRU Memory Poison Manager Makes It In For Linux 6.9
The Linux 6.9 changes for the Error Detection And Correction (EDAC) subsystem are heavy on the AMD changes.
As talked about a few weeks ago, AMD is upstreaming the FRU Memory Poison Manager and indeed this new kernel code successfully landed for Linux 6.9. The FRU Memory Poison Manager allows information on bad/faulty memory to persist across reboots. The FRU Memory Poison Manager is initially wired up for AMD hardware and allows for making use of the ACPI Error Record Serialization Table (ERST) to persist memory error information across reboots.
This FRU Memory Poison Manager goes along with another new Linux 6.9 EDAC feature: row retirement support for MI300 series for being able to retire memory rows on the HBM3 if too many uncorrectable ECC errors are happening. The row retirement support allows for avoiding problematic memory areas while the FRU Memory Poison Manager allows it to (optionally) persist across reboots to avoid repeating the same error-happy memory bits.
The EDAC code in Linux 6.9 also adds the AMD Address Translation Library code for helping to convert reported addresses of hardware errors into system physical addresses for AMD's accelerator world.
Over on the Intel side the EDAC changes include Alder Lake N SoC support within the iGEN6 driver and Intel Grand Ridge support within the i10nm driver. Last week's EDAC pull has the full list of Error Detection And Correction patches that made it for Linux 6.9.
As talked about a few weeks ago, AMD is upstreaming the FRU Memory Poison Manager and indeed this new kernel code successfully landed for Linux 6.9. The FRU Memory Poison Manager allows information on bad/faulty memory to persist across reboots. The FRU Memory Poison Manager is initially wired up for AMD hardware and allows for making use of the ACPI Error Record Serialization Table (ERST) to persist memory error information across reboots.
This FRU Memory Poison Manager goes along with another new Linux 6.9 EDAC feature: row retirement support for MI300 series for being able to retire memory rows on the HBM3 if too many uncorrectable ECC errors are happening. The row retirement support allows for avoiding problematic memory areas while the FRU Memory Poison Manager allows it to (optionally) persist across reboots to avoid repeating the same error-happy memory bits.
The EDAC code in Linux 6.9 also adds the AMD Address Translation Library code for helping to convert reported addresses of hardware errors into system physical addresses for AMD's accelerator world.
Over on the Intel side the EDAC changes include Alder Lake N SoC support within the iGEN6 driver and Intel Grand Ridge support within the i10nm driver. Last week's EDAC pull has the full list of Error Detection And Correction patches that made it for Linux 6.9.
Add A Comment