These results are quite surprising as such.
Yet another update:
Open64 came out as leader in the "benchmarks without flags" category, followed by TCC.
From now on the comparisons will be with flags and we will see how that changes the results compared to the baseline showed here for each compiler.
I'd suggest -O2, which for GCC and ICC is the most commonly used. As for GCC arch flags, I guess what distros use is a good idea? That would be "-march=i686 -mtune=generic" for 32bit. Not sure what is used for 64bit though.
Thanks for your imput. I will try to see which flags that may be most relevant. I might have to set different ones for different compilers too I guess (need to read some manpages).
Originally Posted by RealNC
Feel free to suggest improvements. The more measurement points the clearer the picture (hopefully).
updated with optimizations -GCC
A small update with GCC-optimizations
Not too much did change when optimizations were used (I suppose there are default settings to start with). Surprisingly, -O2 often performed better than -O3. I am considering trying some LTO later.
Next up will however be similar analyses of optimization levels for ICC, Clang and other compilers that have such options.
After that I think I will move on to 32-bit benchmarks, where there are a number of other interesing compilers to test...
Yet another update with icc optimizations
@Michael: sorry if I am spamming phoronix global. I am just of the philosophy "release early, release often" so that feedback can be given as soon as possible.
Next up will be Clang optimizations. Hopefully it will then regain some of its luster, like icc did with -O2 flags. In general, -O3 seems to be a bad performance choice.
Since you'll be running more tests, could you add -Os (optimise for size) to the gcc optimisation options tested? Also for icc, clang and pcc if they have a similar option.
It is well known that -O3 leads to better performance than -O2 only in very specific cases. The reason is partly because -O3 binaries are larger and that makes me suspect that -Os should perform better than -O2 in some cases.
Nice results. Thx for your sharing them. I usually use ICC to compile mplayer achieving around 10% more speed over GCC. If you don't mind could you try those flags on ICC:
-xSSSE3 -fast -fp-model fast=1 -unroll-aggressive
-xSSSE3: sets your processor type to core 2
-fast: enables the major speed optimizations options: -ip -O3 -static
-unroll-aggressive: unroll loops
-fp-model fast=1: implements foating points optimization. (-fp-model fast=2 implements more floating points optimizations but less acurate results)
It's true and some plp have already measured this. As -Os produce small excutables your CPU not waste much time moving data around cache, and in some cases this performs better than -O2 and -O3 optimizations, this is even more important on CPUs with small caches. Some kernels devs recomend -Os flag to compile the kernel.
Originally Posted by Mo6eB
Sure I will try that after I have tried -O2 and -O3 for Clang and Open64, along with the Os-tests for the 4 compilers supporting it (ICC, GCC, Clang, Open64).
Originally Posted by Jimbo
If anyone knows what flags are recommended for tcc and pcc I am all ears.
In addition, if anyone knows how to "unclutter" a big result file on phoronix global -that would be appreciated.
I still want all data in one graph since that actually gives additional value (comparisons between compilers X different optimization levels).
One pattern that seems to be emerging, for example, is that compile time is not inversely related to optimized final binaries (which often is assumed in interpretations of compiler comparisons).
Unfortunately binary size is not part of the current compiler benchmark suite. It would have been nice if the suite stored binary sizes for each compilation...
a good choice of -march might be 'native'. also see http://en.gentoo-wiki.com/wiki/Safe_Cflags
in my own tests with a fortran simulation code O3 beats Os (though this is probably not generally true)
with GCC you might want to look at lto. O3 + lto can make smaller binaries than Os
Also i remember reading an article about how big caches and clever precaching on modern CPUs meant that O3 was better than Os now. i think it was a report by intel. but i can't find it.
Tags for this Thread