A set of eight patches were published today for the Linux kernel that implement an extended hardware error log driver that provides enhanced Intel MCA event logging. With this driver, certain errors are more useful for users like being able to report the particular memory RAM DIMM where a memory corrected error happened and other detailed information not currently exposed via the Linux kernel.
The patch series by Chen Gong explains, "Certain usages such as Predictive Failure Analysis (PFA) require more information about the error than what can be described in processor machine check banks. Most server processors log additional information about the error in processor uncore registers. Since the addresses and layout of these registers vary widely from one processor to another, system software cannot readily make use of them. To complicate matters further, some of the additionalerror information cannot be constructed without detailed knowledge about platform topology. This enhanced MCA logging driver allows firmware to provide additional error information to MCE/CMCI handler and thus addresses this gap."
The mailing list message
also provides sample dmesg and trace outputs of the newly-exposed data when an error takes place.
Intel Enhanced MCA Logging
is explained as "improved Firmware First signalling and enhanced error logs that complement machine check bank content." The Enhanced MCA Logging unfortunately isn't widely-supported yet. Intel says that this technology will be supported by "future Xeon processors" in the white-paper that was published this June.