OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

Written by Michael Larabel in Phoronix Test Suite on 1 February 2021 at 01:35 AM EST. 9 Comments

For those wondering how say AVX heavy a particular program is being benchmarked or if a given program/benchmark supports making use of new instruction set extensions such as Vector AES or forthcoming AVX VNNI or AMX, the Phoronix Test Suite and OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions.

Following the re-architected OpenBenchmarking.org roll-out from last year, new and exciting features continue to be enabled especially for its analytics engine and providing more insight around test profile (benchmark) capabilities. The latest is working to automatically evaluate what prominent CPU instructions are used by a given test profile.

An example showing the new CPU instruction set usage reporting on a test profile pages. Now users can have an immediate idea if a given test is making use of certain performance-sensitive instructions or not.

About nine years ago I wrote an initial CPU instruction analysis for OpenBenchmarking.org albeit was in rough shape and not of much priority. After last year's overhaul to OpenBenchmarking.org, I began toying with it again and rewrote the implementation and is now much more capable. These days it's more interesting as well in an AVX-512 era where it can have significant implications on per-core clock speeds. Plus with the forthcoming Advanced Matrix Extensions (AMX) among other more recent notable extensions, the feature makes more sense and usefulness these days.

As of this weekend, the functionality is now restored on OpenBenchmarking.org. Test profiles will begin displaying what notable CPU instructions are used by a given test/benchmark. All flavors of AVX, AMX, AES, VAES, SERIALIZE, ENQCMD, MOVDIRx, FMA, and BMI2 are among the instructions being reported on the web interface.

Searching tests by instruction set usage and other features will be worked on building off this new functionality.

Even for binary-only/proprietary benchmarks the instruction use is being analyzed for such notable instructions. In the case of open-source programs, the instruction use is analyzed using the default/out-of-the-box compiler configuration for a given program and then also specifying various CPU targets to the compiler. This in turn can let the user know if manually overriding the CPU target if building this program is of use in enabling any of these newer instructions to be used. The Phoronix Test Suite is analyzing the generated binary for the instructions whether they come from hand-tuned Assembly, intrinsics, or inserted by the compiler in use. This doesn't add any overhead when benchmarking with the Phoronix Test Suite but this analysis pass is run separately for those interested and for the case of OpenBenchmarking.org is run within our lab and then reported to the OpenBenchmarking.org analytics engine. The binary of the program being tested is analyzed as well as any local libraries / shared objects part of that test/benchmark.

All of this, of course, happens automatically. The only manual aspect is supplying "interesting" CPU instructions to monitor. OpenBenchmarking.org then handles the rest from analyzing 10+ years of OpenBenchmarking.org data to figure out when that CPU instruction first appeared (if yet) and in turn what CPU families and year that first happened. From the generated information are also convenient listings of supported CPUs for a given instruction set extension and other convenient helpers.

All automated, of course.

This functionality will continue to be built on as time allows. In its current form it should help for those wondering whether certain performance-sensitive instructions are used by a given program/benchmark whether its proprietary/binary-only or open-source and then in some cases if making use of CPU compiler targeting helps in opening up other instruction usage. Any other related feature requests and the like, feel free to voice them via the forums, Twitter, etc.

The functionality on analyzed test profiles (most of them already, remaining ones will be generated in the coming days) can be found by navigating to a given test profile page and found underneath the composite performance listings and the automated CPU core scaling analysis, e.g. Blender, Etcpak, Cryptopp, C-Ray, and the hundreds of other available test profiles via our open-source, fully automated benchmarking software. Go explore on OpenBenchmarking.org and share your feedback.

9 Comments