Linux Preps Hybrid SMP Fix To Avoid Upcoming Laptops Appearing As 11 Socket Monsters
A fix is on its way to the mainline Linux 6.4 kernel and also marked for back-porting to existing stable kernel series to fix x86 topology reporting for Intel Hybrid systems. The topology bug within the kernel becomes more pronounced for Meteor Lake laptops where currently internal Intel test laptops can report the systems having 11 CPU sockets rather than the proper number of cores all contained within one CPU socket.
While Intel hybrid CPU designs have been common since late 2021, the Linux kernel to this point hasn't properly handled the "smp_num_siblings" variable that in turn is propagated to user-space and can end up reporting incorrect information. In the case of an upcoming Intel Meteor Lake P platform, the Linux kernel to this point with the common lscpu command would report there being 11 CPU sockets to the system each with a single core. In reality it's a single socket laptop having 16 total cores.
Besides the wrong information being propagated to user-space for informational purposes, this improper smp_num_siblings handling could end up causing an impact on Linux kernel scheduler decisions. As Intel engineer Zhang Rui commented, "This is also expected to make the scheduler do rather wonky things too."
He further explained with this pending patch:
That patch was picked up yesterday by TIP's x86/urgent branch. In turn this will be submitted to the Linux 6.4 kernel likely over the weekend as an urgent fix. The patch is also marked for back-porting to existing Linux kernel stable versions to ensure the SMP sibling count is properly reported on Intel hybrid platforms.
While Intel hybrid CPU designs have been common since late 2021, the Linux kernel to this point hasn't properly handled the "smp_num_siblings" variable that in turn is propagated to user-space and can end up reporting incorrect information. In the case of an upcoming Intel Meteor Lake P platform, the Linux kernel to this point with the common lscpu command would report there being 11 CPU sockets to the system each with a single core. In reality it's a single socket laptop having 16 total cores.
Besides the wrong information being propagated to user-space for informational purposes, this improper smp_num_siblings handling could end up causing an impact on Linux kernel scheduler decisions. As Intel engineer Zhang Rui commented, "This is also expected to make the scheduler do rather wonky things too."
He further explained with this pending patch:
"Traditionally, all CPUs in a system have identical numbers of SMT siblings. That changes with hybrid processors where some logical CPUs have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when every CPU thread is brought up. The last thread to boot will overwrite it with the number of siblings of *that* thread. That last thread to boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are not updated in any cpu sibling map because the system is treated as an UP system when probing Ecore CPUs."
That patch was picked up yesterday by TIP's x86/urgent branch. In turn this will be submitted to the Linux 6.4 kernel likely over the weekend as an urgent fix. The patch is also marked for back-porting to existing Linux kernel stable versions to ensure the SMP sibling count is properly reported on Intel hybrid platforms.
8 Comments