Intel Continues With More Big-Time Optimizations To The Linux Kernel
I love Linux kernel patches that mention "massively", use exclamation points when talking about performance, and/or simply mention big speed-ups. Quite often such patches come out of Intel and last week they sent out another great performance optimization patch series to improve additional low-level bits of the kernel.
Intel engineers have been sorting through a bottleneck within the Linux kernel networking code and discovered a performance issue around concurrency with the dst_entry data structure. To get right to the point, some key takeaways from the patch comments:
The patches are currently being reviewed and discussed via this LKML thread.
This weekend I couldn't help but to try out the patches myself. When running with the patches on the "rcuref" branch compared to mainline Linux 6.2, I indeed saw striking performance improvements now when looking at the memcached performance on the local host.
For some quick tests I was using the dual Intel Xeon Platinum 8490H "Sapphire Rapids" server with its combined 240 threads while running Ubuntu 23.04 development and comparing the Linux 6.2 state against v6.2 with these rcuref patches:
Indeed some massive speed-ups for memcached! This optimization work also isn't only limited to Intel hardware but should benefit other processors too.
CockroachDB also seemed to benefit from this rcuref kernel work to a smaller extent, among the other limited workloads I've tried so far based on my available time.
In other workloads tested the results were largely flat. But that was just from some weekend benchmarking... I'll have more tests in the coming days. Intel's Clear kernel has also already picked up these kernel optimizations among others, so I'll be working on a scaling look at the Linux performance between Xeon Sapphire Rapids and EPYC Genoa on a few Linux distributions as well in the next week or two. In any event it's wonderful seeing all of the Linux kernel optimizations that continue to be pursued by Intel engineers as well as their many optimizations and enhancements throughout the rest of the stack too from compilers to other key libraries. These Linux kernel scalability optimizations will also become all the more important and beneficial with higher core count Sierra Forest processors next year.
Intel engineers have been sorting through a bottleneck within the Linux kernel networking code and discovered a performance issue around concurrency with the dst_entry data structure. To get right to the point, some key takeaways from the patch comments:
Wangyang and Arjan reported a bottleneck in the networking code related to struct dst_entry::__refcnt. Performance tanks massively when concurrency on a dst_entry increases.
This happens when there are a large amount of connections to or from the same IP address. The memtier benchmark when run on the same host as memcached amplifies this massively. But even over real network connections this issue can be observed at an obviously smaller scale (due to the network bandwith limitations in my setup, i.e. 1Gb).
...
The combination of these two changes results in performance gains in micro benchmarks and also localhost and networked memtier benchmarks talking to memcached. It's hard to quantify the benchmark results as they depend heavily on the micro-architecture and the number of concurrent operations.
The overall gain of both changes for localhost memtier ranges from 1.2X to 3.2X and from +2% to %5% range for networked operations on a 1Gb connection.
A micro benchmark which enforces maximized concurrency shows a gain between 1.2X and 4.7X!!!
The patches are currently being reviewed and discussed via this LKML thread.
This weekend I couldn't help but to try out the patches myself. When running with the patches on the "rcuref" branch compared to mainline Linux 6.2, I indeed saw striking performance improvements now when looking at the memcached performance on the local host.
For some quick tests I was using the dual Intel Xeon Platinum 8490H "Sapphire Rapids" server with its combined 240 threads while running Ubuntu 23.04 development and comparing the Linux 6.2 state against v6.2 with these rcuref patches:
Indeed some massive speed-ups for memcached! This optimization work also isn't only limited to Intel hardware but should benefit other processors too.
CockroachDB also seemed to benefit from this rcuref kernel work to a smaller extent, among the other limited workloads I've tried so far based on my available time.
In other workloads tested the results were largely flat. But that was just from some weekend benchmarking... I'll have more tests in the coming days. Intel's Clear kernel has also already picked up these kernel optimizations among others, so I'll be working on a scaling look at the Linux performance between Xeon Sapphire Rapids and EPYC Genoa on a few Linux distributions as well in the next week or two. In any event it's wonderful seeing all of the Linux kernel optimizations that continue to be pursued by Intel engineers as well as their many optimizations and enhancements throughout the rest of the stack too from compilers to other key libraries. These Linux kernel scalability optimizations will also become all the more important and beneficial with higher core count Sierra Forest processors next year.
16 Comments