Big Throughput Boost & Lower Latency With New Patch For Linux Checksum Function

Written by Michael Larabel in Linux Kernel on 26 May 2023 at 06:21 AM EDT. 13 Comments
Queued up ahead of the Linux 6.5 cycle kicking off in about one month is a new Linux x86 optimization patch for further tuning csum_partial, the function used within the kernel for calculating 32-bit checksums on blocks of data. Much lower latency and higher throughput can be observed with the newly-optimized csum_partial on the latest Intel/AMD processors.

The csum_partial function is used throughout the kernel from networking to file-systems for check-summing purposes. A new patch now queued in tip/tip.git is improving the performance of the x86/x86_64 csum_partial implementation. Developer Noah Goldstein noted in the patch:
x86/csum: Improve performance of `csum_partial`

1) Add special case for len == 40 as that is the hottest value. The nets a ~8-9% latency improvement and a ~30% throughput improvement in the len == 40 case.

2) Use multiple accumulators in the 64-byte loop. This dramatically improves ILP and results in up to a 40% latency/throughput improvement (better for more iterations).

The patch is queued up into TIP's x86/misc branch until the Linux 6.5 merge window gets underway. It's always a joy to see the never-ending performance optimizations to the Linux kernel.

csum_partial x86 faster

Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week