Intel Iris Gallium3D Driver Overhauls Its Buffer Allocation Code

Written by Michael Larabel in Intel on 26 November 2023 at 06:40 AM EST. 4 Comments
INTEL
While much of the modern graphics world these days is focused on the Vulkan API, there's no signs of Intel's open-source graphics driver engineers losing optimization focus with their OpenGL Linux driver by way of the Iris Gallium3D code. Merged this holiday week was a rather significant rework to its buffer object allocation system.

Kenneth Graunke who was one of the original developers of the Iris Gallium3D driver has been working to improve the bucket cache and sub-allocator. With a set of nine patches he made a number of improvements to the buffer object code for Iris. Ken explained in the merge request:
- It ties the bucket-cache system to the IRIS_HEAP_* enums, allowing us to clean up a bunch of copy-pasted code.

- It adds a separate heap for explicitly-coherent system memory (BO_ALLOC_COHERENT). While this isn't necessary for LLC systems, it will be very helpful on non-LLC systems (such as Meteorlake).

- It enables the bucket-cache system for explicitly-coherent BOs on non-LLC systems. Previously, we just skipped the BO cache altogether for coherent resources, which is especially unfortunate since we also mark staging resources as coherent. So those were resulting in fresh allocations every time on non-LLC systems, which is terribly inefficient.

- It enables slab-allocation for explicitly-coherent BOs, on all platforms. We were actually skipping out on the slab allocators even on LLC systems, where there's no reason not to. I tried to fix this in !14763 (closed) previously, but we saw a small performance degradation. One difference in the new MR is that, by having a separate heap, different slab allocators will be used for coherent vs. non-coherent data. Since staging resources are marked coherent currently, this means they won't be allocated out of the same slabs as permanent data. Maybe it'll help?

- It increases the shader uploader BO size so we have fewer BOs to manage.

- It streamlines the bucket sizes. Now that we have suballocation, many of the buckets are unused. We now have 25 instead of 55. (I looked at Unigine Superposition's usage when tuning.)

- It makes better use of 64K pages. When allocating memory in contiguous 2MB chunks, i915 is able to optimize TLB access by setting the PS64 page table bit, letting the TLB know that it can essentially treat them as 64K pages even if they're only 4K pages. This requires both the virtual addresses and physical addresses to be 64K aligned. We remove cache bucket sizes near 2MB (1.75MB, 2.5MB, 3MB, 3.5MB) in favor of 2MB/4MB buckets. We also round to the nearest multiple of 2MB for large allocations.

In particular it appears these BO improvements will benefit forthcoming Meteor Lake platforms the most. The timing is important with Intel Core Ultra "Meteor Lake" chips set to be formally released in December. It's also with Linux 6.7 where the kernel graphics driver declares stable support for the Meteor Lake integrated graphics. I'll be working on procuring a Meteor Lake laptop for Linux testing after launch.

Meteor Lake graphics


After undergoing nearly two months of review, these Iris Gallium3D driver improvements were merged for mesa 24.0-devel. No performance/benchmark comparison numbers were provided to quantify the impact of the changes for any of the Intel graphics platforms.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week