Linux Patches Updated For Better Power Management On AMX "Sapphire Rapids" Servers
While the kernel-side Intel AMX support landed in Linux 5.16 and KVM support for AMX in Linux 5.17, other Linux patches around Advanced Matrix Extensions (AMX) remain floating around. One important patch-set was updated this week for ensuring proper power management on AMX-enabled processors, coming with Xeon Scalable "Sapphire Rapids" this year.
It turns out the large register state of AMX on Sapphire Rapids can result in being bound to shallower low-power sleep states if it's not initialized. The Linux patch series that has now seen four revisions over the past number of months is adjusting the kernel code to ensure the AMX state is initialized before hitting CPU idle with the Intel_Idle driver. By having the AMX register state properly initialized, the Sapphire Rapids processors can then hit their low-power idle states that otherwise wouldn't be achievable. Without this it's the difference of being limited to C1E rather than the deeper C6 sleep state.
This handling is crucial not only for overall power savings of said servers but also ensuring that other CPU cores have greater chances of being able to hit their higher turbo frequency levels thanks to the other idling/inactive cores taking less of the power/frequency budget. This behavior of needing the AMX register state needing to be initialized to reach deeper sleep states is being treated as implementation-specific, so past Sapphire Rapids it's possible that it won't be the case, but for this year's Xeon Scalabe "SPR" servers this patch series will play an important role.
At the moment the patch series is under review on the kernel mailing list. As of writing it still hasn't been queued into the power management subsystem's "-next" branch so it remains to be seen yet if it will land for the imminent v5.19 kernel cycle -- or it may also be attempted as a "fix" for existing kernels. It's a bit surprising though that this rather significant patch series wasn't mainlined already as part of the earlier AMX enablement.
It turns out the large register state of AMX on Sapphire Rapids can result in being bound to shallower low-power sleep states if it's not initialized. The Linux patch series that has now seen four revisions over the past number of months is adjusting the kernel code to ensure the AMX state is initialized before hitting CPU idle with the Intel_Idle driver. By having the AMX register state properly initialized, the Sapphire Rapids processors can then hit their low-power idle states that otherwise wouldn't be achievable. Without this it's the difference of being limited to C1E rather than the deeper C6 sleep state.
This handling is crucial not only for overall power savings of said servers but also ensuring that other CPU cores have greater chances of being able to hit their higher turbo frequency levels thanks to the other idling/inactive cores taking less of the power/frequency budget. This behavior of needing the AMX register state needing to be initialized to reach deeper sleep states is being treated as implementation-specific, so past Sapphire Rapids it's possible that it won't be the case, but for this year's Xeon Scalabe "SPR" servers this patch series will play an important role.
At the moment the patch series is under review on the kernel mailing list. As of writing it still hasn't been queued into the power management subsystem's "-next" branch so it remains to be seen yet if it will land for the imminent v5.19 kernel cycle -- or it may also be attempted as a "fix" for existing kernels. It's a bit surprising though that this rather significant patch series wasn't mainlined already as part of the earlier AMX enablement.
2 Comments