Linux 5.12 Will Avoid Prematurely Shutting Down Intel Mobile Systems When Running Hot
Linux 5.12 with queued thermal changes will avoid prematurely shutting down mobile Intel workstations when a "critical" thermal threshold is reached that isn't too critical.
Sent in on Friday were the thermal patches for the Linux 5.12 merge window. Catching my attention within that assortment of patches were two patches by Canonical's Kai-Heng Feng who is part of the Linux kernel team for Ubuntu.
The patches to Intel's int340x and Intel PCH drivers are for fixing an unexpected shutdown of systems at a "critical" temperature. It isn't that the temperatures are inaccurate, but just not critical enough for the kernel to force the entire system off -- especially if Intel's Thermald daemon is running in user-space or the like or some other user-space system management for deciding when to force a system off for excessive heat.
Kai-Heng Feng explained the situation in the int340x patch:
So if you have been finding your modern Intel mobile workstation unexpectedly powering off, this may be the culprit and the change in behavior is coming with Linux 5.12.
Canonical has already been carrying this patch since January for their Ubuntu kernel builds and their OEM partner kernels.
Sent in on Friday were the thermal patches for the Linux 5.12 merge window. Catching my attention within that assortment of patches were two patches by Canonical's Kai-Heng Feng who is part of the Linux kernel team for Ubuntu.
The patches to Intel's int340x and Intel PCH drivers are for fixing an unexpected shutdown of systems at a "critical" temperature. It isn't that the temperatures are inaccurate, but just not critical enough for the kernel to force the entire system off -- especially if Intel's Thermald daemon is running in user-space or the like or some other user-space system management for deciding when to force a system off for excessive heat.
Kai-Heng Feng explained the situation in the int340x patch:
We are seeing thermal shutdown on Intel based mobile workstations, the shutdown happens during the first trip handle in thermal_zone_device_register():
kernel: thermal thermal_zone15: critical temperature reached (101 C), shutting down
However, we shouldn't do a thermal shutdown here, since
1) We may want to use a dedicated daemon, Intel's thermald in this case, to handle thermal shutdown.
2) For ACPI based system, _CRT doesn't mean shutdown unless it's inside ThermalZone namespace. ACPI Spec, 11.4.4 _CRT (Critical Temperature): "... If this object it present under a device, the device’s driver evaluates this object to determine the device’s critical cooling temperature trip point. This value may then be used by the device’s driver to program an internal device temperature sensor trip point."
So a "critical trip" here merely means we should take a more aggressive cooling method.
As int340x device isn't present under ACPI ThermalZone, override the default .critical callback to prevent surprising thermal shutdown.
So if you have been finding your modern Intel mobile workstation unexpectedly powering off, this may be the culprit and the change in behavior is coming with Linux 5.12.
Canonical has already been carrying this patch since January for their Ubuntu kernel builds and their OEM partner kernels.
8 Comments