Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning
Rather than just comparing -march=x86-64 and -march=znver1, this second part to the AMD EPYC compiler testing is looking at the various optimization levels when using the GCC 9 snapshot on this AMD EPYC 7601 2P server.
The FFTW benchmark shows the common case of where at least hitting -O1 or even -Og yields much of the performance gains that there is to make out of the GCC compiler optimizations. But it does also show that link-time optimizations can pay off with delivering an 11% increase in performance over just the "-O3 -march=znver1" run.
The HMMer sequence analysis program meanwhile shows one of the cases where the (potentially unsafe) -Ofast optimization level pays off, but aside from that not too much of a difference between -O1 and -O3.
The SciMark2 micro-benchmarks show nicely the progression of compiler optimization levels and their impact on performance, but in this case -Ofast was slower than -O3. Link-time optimizations don't pay off since SciMark2 is a single source file anyhow.
If you give John The Ripper any level of optimizations, it's happy enough.
That's a similar story with x264, which for the performance sensitive paths is hand-tuned Assembly.