AMD Continues With MCE/SMCA Linux Driver Changes Ahead Of Zen 4 CPUs
A new patch series posted on Monday for the AMD MCE (Machine Check Exception) driver adds support for two new "syndrome" registers used in "future AMD Scalable MCA systems" and as part of that implementing a new FRU Text feature. Given the timing of this work and AMD's cadence around Linux hardware enablement timing, this is almost certainly for EPYC 7004 "Genoa" and "Bergamo" server processors.
AMD engineers remain very busy working on Linux support ahead of Zen 4 processors launching later this year.
The intention with the new syndrome registers to be found as part of the SMCA IP with future AMD CPUs is for providing supplemental error information. The FRU text feature is for a Field Replaceable Unit (FRU) string that is represented in the new syndrome registers. The FRU text string can vary based on MCA bank and is populated dynamically for each error state. This FRU string will be included as part of all AMD MCE reports for hardware errors.
The new AMD MCE driver patches are now out for review on the kernel mailing list and given the timing could be merged for the v5.19 cycle if no issues turn up. Long story short, this is another patch series pointing at the seemingly more than usual hardware error detection/reporting changes coming for next-generation EPYC server processors and all should be welcomed improvements by server administrators for helping to deal with any hardware/system issues.