The Bizarre Case Of Zstd's Very Slow Performance On Arch Linux
The performance of the zstd binary as shipped by Arch Linux was found to be painfully slow in relation to other Linux distributions on this same laptop... Way slower than the other Linux distributions tested on the same exact hardware and all distributions using their system-supplied Zstd binary for those benchmarks:
But the results were reproducible, repeated multiple times, and I stand behind the benchmarks. Albeit with being stretched thin as is to make end's meet (consider turning off your ad-blocker! Or joining Phoronix Premium for ad-free viewing and other benefits), I didn't have the time/resources to dig further into that particular issue with routinely (pretty much daily) coming across similar performance peculiarities with different hardware/software on Linux.
Fortunately some eager Arch Linux developers looked into it from this article and did find the rather surprising reason their Zstd performance was outrageously slow:
The build system (sort of). Huh? No, not a difference of compiler flags for tuning the optimization or level... Arch Linux uses Zstd's CMake build system to build its package while Ubuntu and other Linux distributions often use the plain Makefile build shipped by Zstd. Zstd also ships Meson build system support too. The Arch Linux developers found using CMake for building Zstd led to the slower Zstd compression speed but if using the conventional make build the performance was as expected (Meson also regressed).
So how the heck is the build system interfering with the resulting binary performance if it's not an optimization level difference? This is where it gets even weirder and another example of the impact that supporting multiple build systems can have on the project...
The CMake build system for Zstd ends up adding the "-std=c99" flag where as the other build systems do not specify using the C99 standard. Rather surprisingly, having that C99 standard specified is what ultimately was found to cause this large performance difference... But as to why specifying C99 causes such a larger performance difference, it's likely due to some threading issue/difference but at the moment there doesn't appear to be a solid explanation. In any case it's a Zstd bug with yielding different fundamental behavior based upon which of their supported build systems is used.
The developers did test the CMake-based build without specifying the "-std=c99" and it did yield similar performance then to the Zstd binaries produced by the alternative build systems. Alternatively, setting C11 instead was okay too.
The great work involved by the Arch Linux developers/contributors (Arvid Norlander, Antonio Rojas, etc) is laid out via their Arch Linux bug report and also having reported it to Zstd as an upstream issue. Hopefully Zstd in turn will change their standard C language version for CMake to match the behavior of their other build systems or better unify the handling of things.
Arch Linux at least has a simple workaround so should be able to ship a fix to their users soon for providing much better performance.