Announcement

Collapse
No announcement yet.

LLVM Clang 12 Leading Over GCC 11 Compiler Performance On Intel Xeon Scalable Ice Lake

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Clang 12 Leading Over GCC 11 Compiler Performance On Intel Xeon Scalable Ice Lake

    Phoronix: LLVM Clang 12 Leading Over GCC 11 Compiler Performance On Intel Xeon Scalable Ice Lake

    Recently we have been running a number of compiler benchmarks looking at the recently released LLVM Clang 12 and GCC 11 open-source code compilers. There is as healthy and competitive competition as ever between GCC and Clang with the mainline Linux kernel these days working well under Clang, more software projects shifting to Clang by default, and the performance being as tight as ever between GCC and Clang for compiled C/C++ code on x86_64 and AArch64. In today's article are benchmarks of Clang 12 vs. GCC 11 on the dual Intel Xeon Platinum 8380 Ice Lake server.

    https://www.phoronix.com/vr.php?view=30234

  • #2
    The friendly "competition" is good for both projects, and, as they say, "A rising tide lifts all boats", so we (the users of the compilers and the resulting binaries) are the real winners.
    Last edited by CommunityMember; 04 June 2021, 11:44 AM.

    Comment


    • #3
      Is -O3 actually faster than -O2? Everytime I test this it's even or slower. (But I actually test this on power efficient cores like arm and Apollo Lake)

      Comment


      • #4
        Originally posted by discordian View Post
        Is -O3 actually faster than -O2? Everytime I test this it's even or slower.
        As is always true, YMWV. And the only way to know for sure for your specific app is, as you say, test it. Same with LTO (and probably PGO, too, although I do not recall any benchmarks showing such for PGO).

        Comment


        • #5
          Originally posted by CommunityMember View Post

          As is always true, YMWV. And the only way to know for sure for your specific app is, as you say, test it. Same with LTO (and probably PGO, too, although I do not recall any benchmarks showing such for PGO).
          yeah, sure. My experience just is resoundingly consistent so that I wonder if -O3 is anything more than a stresstest for the optimizer

          Comment


          • #6
            Originally posted by discordian View Post
            Is -O3 actually faster than -O2? Everytime I test this it's even or slower. (But I actually test this on power efficient cores like arm and Apollo Lake)
            No, of course not. It depends very much on the code itself. What makes this however into another stupid Phoronix article is the statement of "-O3 -flto" being the option to use for performance sensitive code when many benchmarks of the past, made by Phoronix itself, have shown that this is not the case at all. Compilers know a vast number of optimisations and not all of them always produce an improvement. The -O switches merely select a set of these optimisations and anyone who is serious about optimisation knows this and knows to select them carefully. So this is an attempt at gas-lighting the readers and is probably based on a simpleton's idea of a silver bullet, a switch that would optimise any code under any condition but would always produce the best result (".. a switch to rule them all!"). Such a switch would be -O or -O2. However, -O3 and -Ofast and other, more aggressive optimisations can, but not always will, give a better result.

            However, the article does highlight once more that GCC still has an issue when LTO is used without PGO and it needs to be looked at. GCC tends to optimise better with only -O3, or when LTO is used together with PGO. When -O3 gets combined with -flto does GCC tend to produce worse code when compared to just -O3. Phoronix itself has reported this a couple of times.
            Last edited by sdack; 04 June 2021, 01:41 PM.

            Comment


            • #7
              Originally posted by discordian View Post

              yeah, sure. My experience just is resoundingly consistent so that I wonder if -O3 is anything more than a stresstest for the optimizer
              Basically it depends, compilers aren't magical entities capable of breaking the rules of the universe and optimizations levels aren't as smart as you think, so basically is like this.

              1.) If the target code is mainly single threaded, lots branching, containers and very few loops or with branching -O2 is the best you gonna get , see ***

              2.) if you have code with unroll able loops, threads, vectors, clean unbranched loops, etc. -O3+ will make a difference simply because -O2 just don't optimize any of those cases.

              Why in most benchmarks it doesn't look better? basically because write code optimizable by -O3+ is very hard and usually very architecture dependent and most FOSS projects don't have the time(or expertise) to focus on those cases or simply because they lack manpower and adding more complex code will set the bar too high for new contributors which is why is very common that most project take the "Good Enough" approach instead of the "Optimal" approach

              ***
              https://gcc.gnu.org/onlinedocs/gcc/O...e-Options.html (some optimizations are added/missing depending the compiler version)

              Comment


              • #8
                Holy sh*t! I saw it coming: In the long term, Clang/LLVM wins over GCC. There is only one thing left: platform support. This is still the realm of GCC.
                I hope that GCC will still be relevant in the future...

                Comment


                • #9
                  Okay, so you added a new benchmark that heavily favors clang and the balance changed? How are new benchmarks chosen for the list?

                  Comment


                  • #10
                    I love competition on this space. Glad to see we have multiple options. I'm still on GCC and won't probably move away from it anytime soon. But good for the clang guys!

                    Comment

                    Working...
                    X