GCC vs. LLVM/Clang On The AMD Richland APU
Phoronix: GCC vs. LLVM/Clang On The AMD Richland APU
Along with benchmarking the AMD A10-6800K "Richland" APU on Linux and its Radeon HD 8670D graphics, I provided some GCC compiler tuning benchmarks for this AMD APU with Piledriver cores. The latest Linux testing from the A10-6800K is a comparison of GCC 4.8.1 to LLVM/Clang 3.3 on this latest-generation AMD low-power system.
I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.
The gap is only going to widen with LLVM/Clang 3.4 pushing ahead in large areas of performance and scalability.
And there I thought, the language had gotten better…
“Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.”
ugh… you couldn’t have found a better way to say that those tests are completely useless?
Please have a look at
Originally Posted by MWisBest
* Timed MAFFT alignment,
* Botan MAC,
* Himeno and
* C-Ray (please call this one “not by much” again…)
LLVM seems to be very fast at matrix multiplication, though.
So my summary of the results would be:
* LLVM is great at Successive Jacobi Relaxation.
* GCC is great at C-Ray.
* LLVM has no OpenMP support, so don’t even try to use it for scientific code, except if you want to go all the way and use explicit MPI (which makes the SciMark test somewhat less useful).
Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.
If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.
As such the Botan benchmarks are pointless in this context.
This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.
"Those tests are useless"
Originally Posted by ArneBab
now switch perspective to someone who needs OpenMP
"That compiler is useless"
Funny, isn't it.
-O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.
Originally Posted by XorEaxEax
Yes, sometimes -O2 actually beats -O3, but that is because the optimizer sometimes fails in it's job of accurately weighing things like increased cache use against the improved performance of a larger code segment (through inlining, unrolling etc), also -O3 is not specifically indended for 'smaller segments of code', the compiler heuristics typically does a good job of deciding which code benefits from unrolling and inlining, and which codepath's are hot and cold, just because an optimization is enabled it doesn't mean that it will end up used on all segments of code, so yes, you can use -O3 on entire applications just fine, and most cpu intense ones default to -O3 in their configurations.
Originally Posted by carewolf
Of course if you want to give the compiler the best help, you can always use profile guided optimization where you let the compiler gather runtime data which it can then use to better optimize the code.
But despite the fact that -O2 beats -O3 due to failed compiler heuristics, if you only test ONE optimization level then of course it must be -O3, again there is no 'standard' on compiler optimizations enabled per 'level' between compilers. The ONLY standard is that -O3 is supposed to generate the _fastest_ code.
So unless you know beforehand that -O2 in a particular test generates the fastest code for BOTH compilers on a particular benchmark, using -O2 means nothing in a benchmark where you want to see which compiler generates the _fastest_ code, as that is what -O3 is supposed to do and also does in the vast majority of cases.
Actually that’s what I’m talking about: The tests are useless, because their result is useless. If you need OpenMP, you don’t need to look at the results. The compiler is not for you. And if you don’t need OpenMP you don’t need the results either: They have no meaning for you.
Originally Posted by curaga