Linux 5.11 Is Looking Like A Wild And Bumpy Ride On AMD CPUs So Far
A few days ago I noted nice AMD EPYC performance improvements with PostgreSQL when running on Linux 5.11 compared to prior kernels. I've confirmed that for even more AMD EPYC servers now that the PostgreSQL uplift is there, but other workloads are unfortunately regressing for both Ryzen and EPYC. Here's the start of an exciting Christmas benchmarking adventure looking at this change with Linux 5.11...
The aforelinked article noticed some of the very healthy PostgreSQL database server performance improvements to find with Linux 5.11. Those prior servers tested were EPYC Rome 2P but I have also now confirmed with EPYC Rome 1P seeing gains too on completely different systems. Here is a look at the EPYC 7702 1P performance between Linux 5.10 and the current Linux 5.11 development code:
Some very significant and reproducible improvements for AMD EPYC 7002 series PostgreSQL performance on Linux 5.11 now across all the EPYC hardware I have tried so far. Some of the throughput and latency improvements are quite significant. So all is well? Yes and no. Unfortunately, with this ongoing Linux 5.11 testing I am also seeing some performance regressions hitting on Linux 5.11:
What the heck?! A number of the workloads regressing are mostly user-space applications not interacting with the kernel much or even much in the way of I/O... But that ended up jiving with my hypothesis in regards to the Linux 5.11 improvement for PostgreSQL. It's about the CPU frequency invariance support in Linux 5.11 for AMD Zen 2 and newer.
This comparison was done with a stock kernel each time and using the default CPUFreq scaling governor: Schedutil as has been the case since the change a few cycles ago from ondemand. With Linux 5.11 comes AMD frequency invariance support for Zen 2 CPUs and newer. The frequency invariance support being utilized by the scheduler utilization governor could explain the PostgreSQL improvement in making better decisions there but also worse decisions in some workloads such as those noted above.
So for my Christmas benchmarking adventure this year is looking closer at the Linux 5.11 CPUFreq performance on AMD EPYC relative to Linux 5.10. I now have benchmarks going for the schedutil, performance, and ondemand governors both for Linux 5.10 and 5.11 to see how the performance changes for confirming the theory that this change is what's responsible for this shift in AMD performance. Additionally, the Phoronix Test Suite is monitoring the peak CPU frequency, CPU temperature, and CPU power consumption on a per-test basis for all the tests being run across the different kernels and governors. These numbers should be complete within the next couple of days for seeing how Linux 5.11 ultimately is looking for AMD EPYC.
At least from my close monitoring of the changes each merge window, the AMD frequency invariance support is the main alteration in Linux 5.11 that comes to mind that could explain the performance changes especially for the workloads that are regressing.
While that large CPUFreq governor comparison with thermal/power/frequency impact is ongoing, here is a look at the EPYC 7702 1P results between Linux 5.10 and 5.11 for dozens of different workloads:
There is a fair amount of change with Linux 5.11. There were 148 benchmarks ran for that initial Linux 5.10 vs. 5.11 testing on the EPYC 7702 - see all the data points in full via this OpenBenchmarking.org result file. The follow-up much larger look at the Linux 5.11 performance is in the works for the next few days.
But what about on consumer Ryzen CPUs? I have Christmas benchmarking underway there too... On the Ryzen 9 5900X box here are some preliminary side-by-side numbers:
The Ryzen 9 5900X was seeing many workloads now running slower on Linux 5.11 than 5.10... Including PostgreSQL in that case should you be wanting to run a SOHO database server on consumer hardware or the like. Those initial Ryzen 9 5900X data points via this OpenBenchmarking.org result file.
Given such broad changes on the Ryzen front as well with Linux 5.11, CPUFreq behaving less than optimally seems like the leading contender...
Yes, Linux 5.11 doing poorly does seem to be an "AMD thing" at this point. I fired up some benchmarks on the Dell XPS Tiger Lake notebook with various tests and there Linux 5.11 is overall an improvement -- especially with I/O workloads given the IO_uring improvement to find and more.
Those numbers via this result file and while some differing workloads given the differing focus, no big scary drops seen there yet.
Long story short, lots of AMD Ryzen and EPYC benchmarks now running this Christmas in further investigating this matter... Those wanting to support this timely benchmark investigation can join Phoronix Premium this weekend or at the very least turn off your damn ad-blocker(s). Stay tuned.
UPDATE: See Linux 5.11 Is Regressing Hard For AMD Performance With Schedutil
The aforelinked article noticed some of the very healthy PostgreSQL database server performance improvements to find with Linux 5.11. Those prior servers tested were EPYC Rome 2P but I have also now confirmed with EPYC Rome 1P seeing gains too on completely different systems. Here is a look at the EPYC 7702 1P performance between Linux 5.10 and the current Linux 5.11 development code:
Some very significant and reproducible improvements for AMD EPYC 7002 series PostgreSQL performance on Linux 5.11 now across all the EPYC hardware I have tried so far. Some of the throughput and latency improvements are quite significant. So all is well? Yes and no. Unfortunately, with this ongoing Linux 5.11 testing I am also seeing some performance regressions hitting on Linux 5.11:
What the heck?! A number of the workloads regressing are mostly user-space applications not interacting with the kernel much or even much in the way of I/O... But that ended up jiving with my hypothesis in regards to the Linux 5.11 improvement for PostgreSQL. It's about the CPU frequency invariance support in Linux 5.11 for AMD Zen 2 and newer.
This comparison was done with a stock kernel each time and using the default CPUFreq scaling governor: Schedutil as has been the case since the change a few cycles ago from ondemand. With Linux 5.11 comes AMD frequency invariance support for Zen 2 CPUs and newer. The frequency invariance support being utilized by the scheduler utilization governor could explain the PostgreSQL improvement in making better decisions there but also worse decisions in some workloads such as those noted above.
So for my Christmas benchmarking adventure this year is looking closer at the Linux 5.11 CPUFreq performance on AMD EPYC relative to Linux 5.10. I now have benchmarks going for the schedutil, performance, and ondemand governors both for Linux 5.10 and 5.11 to see how the performance changes for confirming the theory that this change is what's responsible for this shift in AMD performance. Additionally, the Phoronix Test Suite is monitoring the peak CPU frequency, CPU temperature, and CPU power consumption on a per-test basis for all the tests being run across the different kernels and governors. These numbers should be complete within the next couple of days for seeing how Linux 5.11 ultimately is looking for AMD EPYC.
At least from my close monitoring of the changes each merge window, the AMD frequency invariance support is the main alteration in Linux 5.11 that comes to mind that could explain the performance changes especially for the workloads that are regressing.
While that large CPUFreq governor comparison with thermal/power/frequency impact is ongoing, here is a look at the EPYC 7702 1P results between Linux 5.10 and 5.11 for dozens of different workloads:
There is a fair amount of change with Linux 5.11. There were 148 benchmarks ran for that initial Linux 5.10 vs. 5.11 testing on the EPYC 7702 - see all the data points in full via this OpenBenchmarking.org result file. The follow-up much larger look at the Linux 5.11 performance is in the works for the next few days.
But what about on consumer Ryzen CPUs? I have Christmas benchmarking underway there too... On the Ryzen 9 5900X box here are some preliminary side-by-side numbers:
The Ryzen 9 5900X was seeing many workloads now running slower on Linux 5.11 than 5.10... Including PostgreSQL in that case should you be wanting to run a SOHO database server on consumer hardware or the like. Those initial Ryzen 9 5900X data points via this OpenBenchmarking.org result file.
Given such broad changes on the Ryzen front as well with Linux 5.11, CPUFreq behaving less than optimally seems like the leading contender...
Yes, Linux 5.11 doing poorly does seem to be an "AMD thing" at this point. I fired up some benchmarks on the Dell XPS Tiger Lake notebook with various tests and there Linux 5.11 is overall an improvement -- especially with I/O workloads given the IO_uring improvement to find and more.
Those numbers via this result file and while some differing workloads given the differing focus, no big scary drops seen there yet.
Long story short, lots of AMD Ryzen and EPYC benchmarks now running this Christmas in further investigating this matter... Those wanting to support this timely benchmark investigation can join Phoronix Premium this weekend or at the very least turn off your damn ad-blocker(s). Stay tuned.
UPDATE: See Linux 5.11 Is Regressing Hard For AMD Performance With Schedutil
5 Comments