Announcement

Collapse
No announcement yet.

LLVM Clang 3.9 Mostly Trails GCC In Compiler Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    I am really surprised that the compile times of Clang are not better. This is the area where Clang had the real lead over GCC.
    And still has, at least for unoptimized debug builds. All my projects build faster on clang. I personally don't care much for the compile times of optimized builds, if GCC produces faster code, then I will use GCC, no matter how fast or slow it may be (within reasons, of course).

    Also, in my experience, clang often seems to generate more reasonable SIMD code compared to gcc (still not on par with hand-written assembly, though), but gcc seems to be quite a bit better at handling integer-only stuff. I'm working on a Mahjong game right now, and the hand evaluator is ~15% faster when built with g++ -O3 compared to clang++ -O3.

    Comment


    • #12
      Originally posted by mlau View Post

      me too, actually, although it shows that more optimization passes don't come for free.
      And the ones that give the final percentages are the most expensive. Getting 80-90% there is fast.. Which is why we have -O2 and -O1

      Comment


      • #13
        Originally posted by s_j_newbury View Post

        Can you look at the generated code in each case and work out why?
        gcc does: 1x vfmadd while clang does: 4x vmul+vadd in an iteration.
        gcc's version is only half the size, but clangs is apparently 4x as fast.

        Comment


        • #14
          Originally posted by mlau View Post

          Oh I agree that it's a almost meaningless benchmark, but it at least shows where both compilers excel at code generation.
          I ran the scimark2 suite with clang HEAD from a few hours ago, and it absolutely destroys gcc in the sparse matmult benchmark:

          Sparse matmult Mflops: 12550.17 (N=1000, nz=5000) (clang git head)
          Sparse matmult Mflops: 3118.18 (N=1000, nz=5000) (gcc-6.2)

          There's no meaningful difference in the other scores.
          I have seen significant performance gains on a number of synthetic benchmarks with clang git (4.0) when -ffast-math is enabled. -flto can also help significantly on clang, whereas on gcc I rarely see it have a significant impact. I believe with -ffast-math clang is willing to do some vectorization that is not considered safe by default.

          Comment


          • #15
            I would be interested to see icc included in these benchmarks as a sort of "reference" compiler. It virtually always produces faster code than gcc or clang, and shows what the hardware is capable of. I would consider "% of Intel Compiler Performance" a reasonable metric for evaluating the speed of code produced by any open source compiler.

            Comment


            • #16
              I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

              I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

              Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.

              Comment


              • #17
                Originally posted by tajjada View Post
                I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

                I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

                Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.
                Wow. A litany of unsubstantiated proclamations must make you feel self-important.

                Comment


                • #18
                  Originally posted by mlau View Post

                  I did, and there's no difference in generated code at all, with both gcc and clang.
                  I checked it with both gcc-6.2 and clang-4: what reported by 'discordian' is true; gcc optimize the second call to sin():
                  Code:
                  $ cat test.c
                  #include <math.h>
                  
                  double test(double x, double y)
                  {
                  
                          double a = sin(x); /* this could set errno */
                          double b = log(y); /* this could set errno */
                          double c = sin(x) + a; /* this could set errno */
                  
                          return a*b*c;
                  }
                  Code:
                  $ gcc-6  -O2 -g -Wall -pedantic  -c test.c  && objdump -Sr test.o
                  
                  test.o:     file format elf64-x86-64
                  
                  
                  Disassembly of section .text:
                  
                  0000000000000000 <test>:
                  #include <math.h>
                  
                  double test(double x, double y)
                  {
                     0:   48 83 ec 18             sub    $0x18,%rsp
                     4:   f2 0f 11 4c 24 08       movsd  %xmm1,0x8(%rsp)
                  
                          double a = sin(x); /* this could set errno */
                     a:   e8 00 00 00 00          callq  f <test+0xf>
                                          b: R_X86_64_PC32        sin-0x4
                          double b = log(y); /* this could set errno */
                     f:   f2 0f 10 4c 24 08       movsd  0x8(%rsp),%xmm1
                          double a = sin(x); /* this could set errno */
                    15:   f2 0f 11 04 24          movsd  %xmm0,(%rsp)
                          double b = log(y); /* this could set errno */
                    1a:   66 0f 28 c1             movapd %xmm1,%xmm0
                    1e:   e8 00 00 00 00          callq  23 <test+0x23>
                                          1f: R_X86_64_PC32       log-0x4
                          double c = sin(x) + a; /* this could set errno */
                    23:   f2 0f 10 14 24          movsd  (%rsp),%xmm2
                  
                          return a*b*c;
                  }
                    28:   48 83 c4 18             add    $0x18,%rsp
                          double c = sin(x) + a; /* this could set errno */
                    2c:   66 0f 28 ca             movapd %xmm2,%xmm1
                    30:   f2 0f 58 ca             addsd  %xmm2,%xmm1
                          return a*b*c;
                    34:   f2 0f 59 d0             mulsd  %xmm0,%xmm2
                    38:   f2 0f 59 ca             mulsd  %xmm2,%xmm1
                    3c:   66 0f 28 c1             movapd %xmm1,%xmm0
                  }
                    40:   c3                      retq
                  Code:
                  $ clang-4.0  -O2 -g -Wall -pedantic  -c test.c  && objdump -Sr test.o
                  
                  test.o:     file format elf64-x86-64
                  
                  
                  Disassembly of section .text:
                  
                  0000000000000000 <test>:
                  #include <math.h>
                  
                  double test(double x, double y)
                  {
                     0:   48 83 ec 18             sub    $0x18,%rsp
                     4:   f2 0f 11 0c 24          movsd  %xmm1,(%rsp)
                  
                          double a = sin(x); /* this could set errno */
                     9:   f2 0f 11 44 24 08       movsd  %xmm0,0x8(%rsp)
                     f:   e8 00 00 00 00          callq  14 <test+0x14>
                                          10: R_X86_64_PC32       sin-0x4
                    14:   f2 0f 11 44 24 10       movsd  %xmm0,0x10(%rsp)
                          double b = log(y); /* this could set errno */
                    1a:   f2 0f 10 04 24          movsd  (%rsp),%xmm0
                    1f:   e8 00 00 00 00          callq  24 <test+0x24>
                                          20: R_X86_64_PC32       log-0x4
                    24:   f2 0f 11 04 24          movsd  %xmm0,(%rsp)
                          double c = sin(x) + a; /* this could set errno */
                    29:   f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
                    2f:   e8 00 00 00 00          callq  34 <test+0x34>
                                          30: R_X86_64_PC32       sin-0x4
                    34:   f2 0f 10 4c 24 10       movsd  0x10(%rsp),%xmm1
                    3a:   f2 0f 58 c1             addsd  %xmm1,%xmm0
                    3e:   f2 0f 10 14 24          movsd  (%rsp),%xmm2
                  
                          return a*b*c;
                    43:   f2 0f 59 d1             mulsd  %xmm1,%xmm2
                    47:   f2 0f 59 d0             mulsd  %xmm0,%xmm2
                    4b:   66 0f 28 c2             movapd %xmm2,%xmm0
                    4f:   48 83 c4 18             add    $0x18,%rsp
                    53:   c3                      retq

                  Comment


                  • #19
                    Originally posted by tajjada View Post
                    I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

                    I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

                    Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.
                    I think everybody who read it noticed it, but the "regressions" were generally minor, and it is very hard to say how representative this set of selected benchmarks are, and I know LLVM people read the site, so if there is something worth looking into and fixing, I am sure they will.

                    Comment


                    • #20
                      Originally posted by discordian View Post
                      It would be interesting to run these tests on clang with the -fno-math-errno flag.
                      clang prohibits several optimization by default (which could result in visible differences), whereas gcc doesnt.

                      Code:
                      double a = sin(x); /* this could set errno */
                      double b = log(y); /* this could set errno */
                      double c = sin(x) + a; /* this could set errno */
                      gcc will only call sin(x) once and double the value, clang will not change the order since the functions could modify global state (errno) and reordering them could break programs that depend on this behavior.

                      Whetstone for example is notoriously affected by such flags.
                      I don't see the problem with reordering in the above scenario as long as there is no code that reads errno between those calls and as long as the last executed call is sin(x).

                      Comment

                      Working...
                      X