Intel Workaround For Graphics Driver Regression: "The Platform Problem Going Crazy"
Sent out over the weekend was a patch series for the Intel Linux kernel graphics driver entitled "Time, where did it go?" This set of 42 patches aims to provide incremental improvements to the driver to offset a performance regression in Linux 5.7 that Intel hasn't been able to track down. This increased complication of the driver to offset the regression is now under the microscope.
The set of 42 patches by longtime Intel open-source developer Chris Wilson provides incremental improvements to reduce the execution latency. He was upfront that the intent of these improvements are to "basically offsets the small regressions incurred when compared to [Linux kernel] 5.7."
To that patch series was then DRM subsystem maintainer David Airlie of Red Hat asking what introduced the regressions in Linux 5.7 and whether they are documented. As well, whether the regression is noticeable just to benchmarks or applications, etc.
To that Chris Wilson noted that the DRM merge to Linux 5.8 prior to RC1 caused a regression in performance fluctuations. But they haven't been able to figure out the exact commit/cause of the problem. The regression was spotted for simulated transcode workloads.
Chris then admitted the reasoning for this set of patches to improve execution latency to offset the regression impact. "Entirely motivated by not wanting to have to explain why there's even a 1% regression in their client metrics. They wouldn't even notice for a few releases by which point the problem is likely compounded and we suddenly have crisis meetings."
In turn is Airlie's response:
Hopefully the Intel open-source driver developers manage to uncover the root cause of this regression appearing in Linux 5.8 in a timely manner.
The set of 42 patches by longtime Intel open-source developer Chris Wilson provides incremental improvements to reduce the execution latency. He was upfront that the intent of these improvements are to "basically offsets the small regressions incurred when compared to [Linux kernel] 5.7."
To that patch series was then DRM subsystem maintainer David Airlie of Red Hat asking what introduced the regressions in Linux 5.7 and whether they are documented. As well, whether the regression is noticeable just to benchmarks or applications, etc.
To that Chris Wilson noted that the DRM merge to Linux 5.8 prior to RC1 caused a regression in performance fluctuations. But they haven't been able to figure out the exact commit/cause of the problem. The regression was spotted for simulated transcode workloads.
Chris then admitted the reasoning for this set of patches to improve execution latency to offset the regression impact. "Entirely motivated by not wanting to have to explain why there's even a 1% regression in their client metrics. They wouldn't even notice for a few releases by which point the problem is likely compounded and we suddenly have crisis meetings."
In turn is Airlie's response:
I don't think this sort of thing is acceptable for upstream. This is the platform problem going crazy. Something regresses in the kernel core, and you refactor the i915 driver to get horribly more complicated to avoid fixing the core kernel regressions?
This has to stop, if Intel can't stop it internally, i.e. the GEM kernel team hasn't got the sort of power, then it has to stop upstream.
This is a hard NAK for this sort of refactoring, now and in the future.
Hopefully the Intel open-source driver developers manage to uncover the root cause of this regression appearing in Linux 5.8 in a timely manner.
21 Comments