Oracle Working On Multi-Threaded VFIO Page Pinning For ~10x Faster QEMU Initialization
Oracle engineers have been working on multi-threaded VFIO page pinning to speed-up the initialization process and can be quite noticeable impact for large guest VMs. The patch series providing this multi-threaded VFIO page pinning is currently under a "request for comments" and the patch cover letter explains the motivation and benefits:
Assigning a VFIO device to a guest requires pinning each and every page of the guest's memory, which gets expensive for large guests even if the memory has already been faulted in and cleared with something like qemu prealloc.
Some recent optimizations have brought the cost down, but it's still a significant bottleneck for guest initialization time. Parallelize with padata to take proper advantage of memory bandwidth, yielding up to 12x speedups for VFIO page pinning and 10x speedups for overall qemu guest initialization. Detailed performance results are in patch 8.
Phase one of multithreaded jobs made deferred struct page init use all the CPUs on x86. That's a special case because it happens during boot when the machine is waiting on page init to finish and there are generally no resource controls to violate.
Page pinning, on the other hand, can be done by a user task (the "main thread" in a job), so helper threads should honor the main thread's resource controls that are relevant for pinning (CPU, memory) and give priority to other tasks on the system. This RFC has some but not all of the pieces to do that.
12x speed-up for VFIO page pinning thanks to multi-threading is quite a difference and then especially translating into a ~10x speed-up for overall QEMU guest initialization. This patch message has more of the AMD and Intel server performance test details.
Large servers with lots of RAM obviously stand to benefit the most from this VFIO multi-threaded page pinning.
Oracle has been carrying some of these patches in their downstream kernel builds for Oracle Enterprise Linux for about three years. See this patch series for the initial 16 RFC patches.