Announcement

Collapse
No announcement yet.

GCC vs. LLVM Clang On NVIDIA's Tegra K1 Quad-Core Cortex-A15

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC vs. LLVM Clang On NVIDIA's Tegra K1 Quad-Core Cortex-A15

    Phoronix: GCC vs. LLVM Clang On NVIDIA's Tegra K1 Quad-Core Cortex-A15

    Recently I posted new benchmarks showing LLVM's Clang compiler performing well against GCC from AMD's x86-based Athlon APUs with the performance of the resulting binaries being quite fast but not without some blemishes for both of these open-source compilers. In seeing how the compiler race is doing in the ARM space with many ARM vendors taking interest in LLVM/Clang, here's some fresh benchmarks of both compilers on NVIDIA's Tegra K1 SoC found by the Jetson TK1 development board.

    http://www.phoronix.com/vr.php?view=20386

  • #2
    x86 vs ARM on GCC vs LLVM

    It's interesting to see how the situation seems to be different when going from x86 to ARM when comparing GCC and LLVM. On x86, GCC pretty much wins in all but a few tests. Move to LLVM and the situation flips. ARM is the future for much of the stuff I'll be doing, so this is very good to know.

    Throw in MP and LLVM still gets beaten quite badly. Once that support makes it in to LLVM, GCC is going to take quite a beating on ARM--if MP is anything like single threaded performance.

    Comment


    • #3
      Michael seems to favor floating point benchmarks once again. Regardless of whether the floating point performance makes much sense for anything other than scientific computations (which are hardly a typical workload for ARM devices), the important factor affecting the results is the "--with-fpu=vfpv3-d16" configure option used for GCC. For ARM Cortex-A15 it would be definitely more correct to set it to "--with-fpu=neon-vfpv4". Basically, the benchmarks were only using half of the floating point registers in the case of GCC. I don't know what was used for Clang, but it could have had an unfair advantage just because of using better floating point options. The integer workloads were seriously underrepresented. And the compilation speed tests are comparing apples with oranges (the amount of work done by the compilers is different).

      TL;DR; - The article appears to be extremely biased and tries very hard to showcase the good sides of Clang

      Comment


      • #4
        Did I miss the compiler settings, or werent there any posted. Even if, gcc -O2 is different than clang -O2 so its rather useless comparing the same "option strings". Finding the best options for each compiler & test would be more usefull.
        And its no surprise to me that clang compares alot better iwith ARM Cpus, on x86 there are some decades of adjusting codes to the strenght and shortcomings of gcc and the complex x86 quirks. On Arm the field is alot more even and alot less quirks in the architecture.

        Comment


        • #5
          The list of individual benchmarks chosen looks really like someone is trying to make clang look shiny.
          GCC is better on the 2-3 benchmarks which handle real world stuff, the rest (compile time and synthetic
          benchmark) clang wins. So the result of the article could also be: gcc good for real world stuff, clang good in artificial and nonsensical benchmarks.

          Comment


          • #6
            compare object code size as well please

            Oh and while I'm at it:
            How about you also compare the code size of the output the compilers produce?

            (I've been playing with llvm-svn on MIPS a bit, and so far it consistently produces larger object files than GCC)

            Comment


            • #7
              When GCC beats Clang, it has always been fair and proof that it generates "vastly superior binaries". Now that Clang is catching up and even leading in some cases, benchmarks are just "useless". Zealots are disgusting.

              Comment


              • #8
                Originally posted by Sergio View Post
                When GCC beats Clang, it has always been fair and proof that it generates "vastly superior binaries". Now that Clang is catching up and even leading in some cases, benchmarks are just "useless". Zealots are disgusting.
                For the start, please try using GCC and Clang yourself. It is really not difficult to get a little bit of first hand experience with both and see whether Clang is really catching up or not. Using the applications and workloads you care about.

                Comment


                • #9
                  Originally posted by willmore View Post
                  It's interesting to see how the situation seems to be different when going from x86 to ARM when comparing GCC and LLVM. On x86, GCC pretty much wins in all but a few tests.
                  I'd love to have the drugs you're smoking.

                  Comment


                  • #10
                    When OpenMP 4.x lands in LLVM proper

                    I fully expect Michael to stop running OpenMP enabled tests when comparing GCC vs. LLVM/Clang.

                    Soon, very soon, not a single test [other than the highly tailored C-Ray for GCC] will show GCC remotely near its competitor in results. The more LLVM/Clang evolves the less GCC becomes relevant.

                    Comment


                    • #11
                      GPU tests?

                      Hey Michael. All I see is CPU benchmarks. Have you published benchmarks that show the performance of the K1 GPU?

                      Comment


                      • #12
                        Originally posted by discordian View Post
                        Did I miss the compiler settings, or werent there any posted. Even if, gcc -O2 is different than clang -O2 so its rather useless comparing the same "option strings". Finding the best options for each compiler & test would be more usefull.
                        And its no surprise to me that clang compares alot better iwith ARM Cpus, on x86 there are some decades of adjusting codes to the strenght and shortcomings of gcc and the complex x86 quirks. On Arm the field is alot more even and alot less quirks in the architecture.
                        I have to agree with the complaint here.
                        There are multiple issues of interest.

                        For CODE GENERATION QUALITY, the only setting that makes sense is to run both compilers at their maximum speed settings. (-O4 if they support that, using LTO, etc). -fast-math IF the benchmarks are such that fast math makes sense. This will depend on exactly what the benchmark is doing --- obviously there is plenty of FP code that runs just fine with fast math --- and a small fraction for which fast-math is completely unacceptable.
                        Running two compilers at -O2, which means different things for each compiler makes no sense.

                        And if one or the other compiler crashes, or results in code that crashes/doesn't work at -Omax, that should also be pointed out, not brushed over by falling back to -O2.

                        For COMPILE TIME tests, things are a little more complicated because the issue is: why do you care? Presumably the general reason you care is you want the write/compile/debug cycle to be as fast as possible. In which case, the settings should be the settings that would be used for the write/compile/debug cycle. Obviously -g, and whatever "most" people use as the optimization setting. Personally I'm happy to debug at -O2 or -O3, but there appear to be a large crowd of people who cannot handle the fact of no one-to-one mapping between each C line/variable and an identical asm instruction, and who only debug at -O0. So maybe a compromise and runs the tests at -O1?

                        Comment


                        • #13
                          Originally posted by name99 View Post

                          For COMPILE TIME tests, things are a little more complicated because the issue is: why do you care?
                          Expanding on that: The compile time test is especially useless on ARM: Everybody working with ARM is going to cross-compile on a
                          fast multicore x64 machine anyway!

                          Comment

                          Working...
                          X