CentOS ISA SIG Experimenting With New x86-64 Baseline For Better Performance
The CentOS ISA special interest group (SIG) has been evaluating the performance of CentOS Stream in the event its x86_64 baseline were to be raised from x86-64-v2 to x86-64-v3. Currently CentOS Stream 9 targets x86-64-v2 but in upping the support requirements to x86-64-v3, it would allow the ability to engage AVX/AVX2 by default and make use of other newer instruction set features. The x86-64-v3 baseline roughly correlates to Intel / AMD CPUs from 2015 and newer.
The CentOS ISA SIG experimented with building out x86-64-v2 and x86-64-v3 packages and rebuilding the vast majority of the CentOS Stream 9 archive with these different defaults. The x86-64-v3 optimized repository is publicly available but is not being maintained with no regular updates beyond the state of the packages when this testing commenced.
The CentOS ISA SIG used a variety of performance benchmarks for looking at the performance. Some of the SIG's key takeaways included:
Read more about their performance investigation to the x86-64-v2 vs. x86-64-v3 via the CentOS Blog.
Hopefully we'll see more investigations by the CentOS ISA SIG and also seeing broader considerations by the Linux distribution communities around raising their x86-64 baselines.
The CentOS ISA SIG experimented with building out x86-64-v2 and x86-64-v3 packages and rebuilding the vast majority of the CentOS Stream 9 archive with these different defaults. The x86-64-v3 optimized repository is publicly available but is not being maintained with no regular updates beyond the state of the packages when this testing commenced.
The CentOS ISA SIG used a variety of performance benchmarks for looking at the performance. Some of the SIG's key takeaways included:
"Some of the benchmarks, such as the glibc_bench, Mocassin, and John the Ripper (md5crypt) benchmarks saw significant results. Given that we changed both the compiler version and the baseline, we dug into which of those variables contributed the most impactful change to the results. For the latter two benchmarks, we saw a 2.2x speed up. Mocassin seems to benefit the most from the auto-vectorization that GCC12 does. Using GCC 11 with appropriate compiler flags produced similar results. The md5crypt benchmark benefits from the increased parallelism and double register width that AVX introduces over SSE. That was encouraging as it further confirms the hypothesis that workloads which lend themselves to vectorization can benefit greatly. Interestingly, we discovered that some of the upstream glibc math library functions lacked optimized IFUNC implementations. This is currently being investigated upstream to see if further IFUNC additions will result in similar gains for those areas.
...
With our preliminary results showing gains in the areas we expect but not overwhelming performance across the board, one might wonder why that is and what comes next.
Taking a step back and thinking about this at a higher level, performance is always going to be dictated by the predominant code paths that the CPUs are executing. That code is going to be highly dependent on the workloads being run. For example, if a machine is primarily used as a database server, one would expect the DB code itself to be the main factor in where performance can be gained. The underlying OS will of course be important, but the CPU is likely spending most of the time executing DB code, and a highly tuned lower-level library isn’t going to sway the results much unless the DB is calling into that library frequently. Similar for something like a webserver. For application level workloads we provide as part of the OS, we can of course build them with the newer baseline and perhaps see gains. However, most of the applications that are run on top of CentOS Stream are built by the end user or an ISV.
This observation leaves us with a bit of a conundrum. We can tune the OS userspace, but if the applications aren’t tuned then does it matter? We think it can but it would be really interesting to see what other workloads lend themselves to optimization that we don’t really have insight into. One of the ideas we have is to provide some documentation on how to build end user applications with the newer baselines, or to provide some packages that a user could install and tweak the compiler defaults to take advantage of newer instructions."
Read more about their performance investigation to the x86-64-v2 vs. x86-64-v3 via the CentOS Blog.
Hopefully we'll see more investigations by the CentOS ISA SIG and also seeing broader considerations by the Linux distribution communities around raising their x86-64 baselines.
22 Comments