An Early Performance Regression Hitting Highly Threaded Workloads On Linux 6.14-rc1

Written by Michael Larabel in Software on 4 February 2025 at 03:20 PM EST. Page 1 of 2. 14 Comments.

With Linux 6.14-rc1 released I have begun trying out the new development kernel on a few systems locally. At least for high core count hardware tested thus far, Linux 6.14 at the moment during this early testing phase is sporting some performance regressions within some multi-threaded workloads.

AMD Volcano server

One of the first systems I tried with Linux 6.14-rc1 was the AMD Volcano reference server for 5th Gen EPYC "Turin" loaded up with dual EPYC 9755 processors for a combined 256-cores / 512-threads of compute power. Linux 6.12, Linux 6.13, and Linux 6.14-rc1 were tested on this high-end AMD EPYC Turin server using the same Kconfig configuration each time.

Linux Kernel Comparison EPYC 9755 AMD

No hardware changes were made during the testing process... The reported CPU frequency difference on Linux 6.13+ comes down to that kernel now defaulting to the AMD P-State driver rather than ACPI CPUFreq and in turn base vs. boost clock reporting difference in sysfs. This AMD EPYC Zen 5 server made for speedy work beginning to test out the new Linux kernel. No other changes were made besides swapping out the kernel version in use.

OpenFOAM benchmark with settings of Input: drivaerFastback, Medium Mesh Size, Mesh Time. v6.13 was the fastest.
RELION benchmark with settings of Test: Basic, Device: CPU. v6.13 was the fastest.

But right off the bat some performance slowdowns were observed in some of the HPC workloads with the Linux 6.14-rc1 performance relative to Linux 6.13.... In some cases wiping away gains made with Linux 6.13 over Linux 6.12, but in other cases just hitting worse performance:

Liquid-DSP benchmark with settings of Threads: 128, Buffer Length: 256, Filter Length: 512. v6.12 was the fastest.
Liquid-DSP benchmark with settings of Threads: 256, Buffer Length: 256, Filter Length: 32. v6.12 was the fastest.
Liquid-DSP benchmark with settings of Threads: 256, Buffer Length: 256, Filter Length: 57. v6.13 was the fastest.

Such as with the Liquid-DSP digital signal processing software at high core/thread counts, there were big regressions over Linux 6.13 and 6.12....

Liquid-DSP benchmark with settings of Threads: 256, Buffer Length: 256, Filter Length: 512. v6.12 was the fastest.
Liquid-DSP benchmark with settings of Threads: 512, Buffer Length: 256, Filter Length: 512. v6.12 was the fastest.

The Liquid-DSP performance was moving in the wrong direction with Linux 6.14.

Liquid-DSP benchmark with settings of Threads: 32, Buffer Length: 256, Filter Length: 57. v6.14-rc1 was the fastest.
Liquid-DSP benchmark with settings of Threads: 64, Buffer Length: 256, Filter Length: 57. v6.14-rc1 was the fastest.
Liquid-DSP benchmark with settings of Threads: 128, Buffer Length: 256, Filter Length: 57. v6.14-rc1 was the fastest.

But an important observation is these regressions were happening only at the higher thread counts. When running with "only" 32~64 threads on this 256-core / 512-thread server, the Linux 6.14 kernel was showing a nice performance improvement.

Related Articles