Linux 6.10 Preps For "When Things Go Seriously Wrong" On Bigger Servers

Written by Michael Larabel in Linux Kernel on 18 May 2024 at 06:34 AM EDT. 5 Comments
LINUX KERNEL
While machine check exception (MCE) events tend to be uncommon, a change made by Intel engineers is accommodating the ability in the Linux kernel to store more machine check records for "when things go seriously wrong" on increasingly high core count servers.

The Linux kernel to now had maintained a memory pool for being able to store 80 machine check exception records but Intel's Tony Luck has increased that threshold for accommodating increasingly larger server processors:
"Systems with a large number of CPUs may generate a large number of machine check records when things go seriously wrong. But Linux has a fixed buffer that can only capture a few dozen errors.

Allocate space based on the number of CPUs (with a minimum value based on the historical fixed buffer that could store 80 records)."

The new behavior implemented in Linux 6.10 is to maintain a pool size of at least 80 records or otherwise two records per CPU core, whichever ends up being greater... In other words, on Linux 6.10+ systems with 40 CPU cores or more will see an expanded pool for storing MCE records when the system state goes awry.

The change was merged as the only RAS updates for Linux 6.10.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week