Oh GuC: Intel ADL-P Graphics On Linux 5.19 Will Break Unless You Also Upgrade Firmware
After completing my recent Intel Core i7 1280P "Alder Lake P" Linux benchmarking that was done using Linux 5.18 stable, I moved on to testing the Linux 5.19 kernel to see how the performance is with that new kernel... After all, on some systems there are very nice gains with v5.19 Git over 5.18 and prior.
When jumping onto the latest Linux 5.19 Git using the convenient Ubuntu Mainline Kernel PPA daily builds, I was surprised to find the accelerated graphics not working. The shiny new Intel Evo Alder Lake P laptop was back to using LLVMpipe... Thinking it was some random hardware issue or uncaught bug now at the 5.19-rc6 state, that quickly proved to not be the case but rather an intentional Intel change. When looking at the dmesg output, the Alder Lake P graphics failed to initialize due to missing firmware.
Upgrading to Linux 5.19 on this Intel ADL-P laptop resulted in the Xe Graphics not working due to now -requiring- a newer firmware.
With Alder Lake P, the firmware for the GuC micro-controller and its usage is now mandatory. With prior generations of Intel graphics for years going back to Gen9 graphics with Skylake, the GuC usage has been optional. This "graphics micro-controller" is used for offloading some tasks from the driver and can be used for low-level graphics context scheduling, authentication of the HEVC/H.264 (HuC) micro-controller, and more recently even power management. With prior hardware the GuC hasn't been used by default but required setting the GuC module parameter for the i915 driver (i915.enable_guc=1).
Now with Alder Lake P and all future Intel platforms (including DG2/Alchemist discrete graphics), the GuC firmware and its usage is now mandatory since power management is offloaded to this micro-controller. But it's not a matter of simply ensuring the GuC firmware is there, but that the correct version of the GuC firmware is present for your given kernel.
With Linux 5.19 this became pronounced when upgrading the kernel on this Core i7 1280P laptop that had been happily running Linux 5.18 only to find the i915 initialization now failed. GuC firmware 69 had been present on the system and in use but now with Linux 5.19 the driver expects GuC firmware 70 to be present and doesn't support backwards compatibility with loading the prior firmware. The GuC firmware/driver interface doesn't appear to be stable and now up to version 70.
The breakage with Linux 5.19 comes via this commit removing the GuC 69 firmware and going for 70. That commit message also reaffirms the continued churn of GuC with updates requiring driver changes:
The latest GuC firmware drops the context descriptor pool in favour of passing all creation data in the create H2G. It also greatly simplifies the work queue and removes the process descriptor used for multi-LRC submission. So, remove all mention of LRC and process descriptors and update the registration code accordingly.
Unfortunately, the new API also removes the ability to set default values for the scheduling policies at context registration time. Instead, a follow up H2G must be sent. The individual scheduling policy update H2G commands are also dropped in favour of a single KLV based H2G. So, change the update wrappers accordingly and call this during context registration....
This isn't the first time they have required new GuC firmware, but at least with prior work on the Intel graphics micro-controller firmware handling it was for hardware at an unreleased point or for prior generations of hardware that wasn't mandatory for GuC usage. Now with Linux 5.19 the Alder Lake P graphics in released laptops can break if not also switching to a new version of the firmware in tandem. At least though the GuC version doesn't break each kernel cycle.
The ADL-P Xe Graphics were working fine once downloading the latest "GuC 70" firmware binaries.
With GuC being mandatory now for future hardware too, it's something for Intel users to keep in mind when upgrading to new versions of the kernel. Intel did add the GuC 70 firmware binaries to the linux-firmware.git tree back in April but at least for the likes of Ubuntu 22.04 LTS they are not shipping that firmware since their older Linux 5.15 kernel is targeting GuC 69. Older GuC firmwares for various Intel generations continue to be carried in the linux-firmware.git tree. Fortunately major GuC versions don't appear to be introduced too often, but still it's a change ultimately breaking user-space.
I thought some years ago Linus Torvalds ranted against similar behavior (with a WiFi chipset if my memory serves me) that newer kernels cannot mandate newer firmware while breaking backwards compatibility, but I can't find the precise message I had seemed to recall. But given his comments in general that the kernel shouldn't break user-space binary compatibility and trying to maintain that consistent user experience, it would seem the Intel GuC firmware versioning goes against that ideal. At the very least, now you are aware of this possible issue to deal with when upgrading your kernel on Alder Lake P and newer platforms at times when the GuC firmware version changes.
Update: It looks like this change slipped under the radar of DRM subsystem maintainer David Airlie. He is now seeking Intel to revert/fix the firmware handling still for Linux 5.19 release candidates so as to not break the firmware handling / backwards compatibility. Stay tuned...