GCC vs. LLVM Clang On NVIDIA's Tegra K1 Quad-Core Cortex-A15
Phoronix: GCC vs. LLVM Clang On NVIDIA's Tegra K1 Quad-Core Cortex-A15
Recently I posted new benchmarks showing LLVM's Clang compiler performing well against GCC from AMD's x86-based Athlon APUs with the performance of the resulting binaries being quite fast but not without some blemishes for both of these open-source compilers. In seeing how the compiler race is doing in the ARM space with many ARM vendors taking interest in LLVM/Clang, here's some fresh benchmarks of both compilers on NVIDIA's Tegra K1 SoC found by the Jetson TK1 development board.
x86 vs ARM on GCC vs LLVM
It's interesting to see how the situation seems to be different when going from x86 to ARM when comparing GCC and LLVM. On x86, GCC pretty much wins in all but a few tests. Move to LLVM and the situation flips. ARM is the future for much of the stuff I'll be doing, so this is very good to know.
Throw in MP and LLVM still gets beaten quite badly. Once that support makes it in to LLVM, GCC is going to take quite a beating on ARM--if MP is anything like single threaded performance.
Michael seems to favor floating point benchmarks once again. Regardless of whether the floating point performance makes much sense for anything other than scientific computations (which are hardly a typical workload for ARM devices), the important factor affecting the results is the "--with-fpu=vfpv3-d16" configure option used for GCC. For ARM Cortex-A15 it would be definitely more correct to set it to "--with-fpu=neon-vfpv4". Basically, the benchmarks were only using half of the floating point registers in the case of GCC. I don't know what was used for Clang, but it could have had an unfair advantage just because of using better floating point options. The integer workloads were seriously underrepresented. And the compilation speed tests are comparing apples with oranges (the amount of work done by the compilers is different).
TL;DR; - The article appears to be extremely biased and tries very hard to showcase the good sides of Clang
Did I miss the compiler settings, or werent there any posted. Even if, gcc -O2 is different than clang -O2 so its rather useless comparing the same "option strings". Finding the best options for each compiler & test would be more usefull.
And its no surprise to me that clang compares alot better iwith ARM Cpus, on x86 there are some decades of adjusting codes to the strenght and shortcomings of gcc and the complex x86 quirks. On Arm the field is alot more even and alot less quirks in the architecture.
I have to agree with the complaint here.
Originally Posted by discordian
There are multiple issues of interest.
For CODE GENERATION QUALITY, the only setting that makes sense is to run both compilers at their maximum speed settings. (-O4 if they support that, using LTO, etc). -fast-math IF the benchmarks are such that fast math makes sense. This will depend on exactly what the benchmark is doing --- obviously there is plenty of FP code that runs just fine with fast math --- and a small fraction for which fast-math is completely unacceptable.
Running two compilers at -O2, which means different things for each compiler makes no sense.
And if one or the other compiler crashes, or results in code that crashes/doesn't work at -Omax, that should also be pointed out, not brushed over by falling back to -O2.
For COMPILE TIME tests, things are a little more complicated because the issue is: why do you care? Presumably the general reason you care is you want the write/compile/debug cycle to be as fast as possible. In which case, the settings should be the settings that would be used for the write/compile/debug cycle. Obviously -g, and whatever "most" people use as the optimization setting. Personally I'm happy to debug at -O2 or -O3, but there appear to be a large crowd of people who cannot handle the fact of no one-to-one mapping between each C line/variable and an identical asm instruction, and who only debug at -O0. So maybe a compromise and runs the tests at -O1?
Expanding on that: The compile time test is especially useless on ARM: Everybody working with ARM is going to cross-compile on a
Originally Posted by name99
fast multicore x64 machine anyway!
The list of individual benchmarks chosen looks really like someone is trying to make clang look shiny.
GCC is better on the 2-3 benchmarks which handle real world stuff, the rest (compile time and synthetic
benchmark) clang wins. So the result of the article could also be: gcc good for real world stuff, clang good in artificial and nonsensical benchmarks.
compare object code size as well please
Oh and while I'm at it:
How about you also compare the code size of the output the compilers produce?
(I've been playing with llvm-svn on MIPS a bit, and so far it consistently produces larger object files than GCC)
When GCC beats Clang, it has always been fair and proof that it generates "vastly superior binaries". Now that Clang is catching up and even leading in some cases, benchmarks are just "useless". Zealots are disgusting.
I'd love to have the drugs you're smoking.
Originally Posted by willmore