GCC 11 Compiler Performance Benchmarks With Various Optimization Levels, LTO
As has been the case for many years, it comes down to the particular software/code-base for the impact on the different optimization levels. Usually when hitting at least -O1 or -O2 is at least a majority of the performance achieved. In some cases the "-march=native" targeting can help a lot for catering the generated code/instructions to the CPU/family in use while in other cases less so, such as with Crypto++. When using "-flto" for link-time optimizations, in the case of the Crypto++ benchmark was actually lower performance on this Rocket Lake system.
MrBayes benefits a lot from the very aggressive "-Ofast" level but can yield potentially unsafe math operations.
In HMMer the -Ofast level did still squeeze out slightly better performance than -O3 but not nearly as significant as seen with MrBayes.