Announcement

**VikingGe** · 14 September 2016, 09:22 AM

I am really surprised that the compile times of Clang are not better. This is the area where Clang had the real lead over GCC.

And still has, at least for unoptimized debug builds. All my projects build faster on clang. I personally don't care much for the compile times of optimized builds, if GCC produces faster code, then I will use GCC, no matter how fast or slow it may be (within reasons, of course).

Also, in my experience, clang often seems to generate more reasonable SIMD code compared to gcc (still not on par with hand-written assembly, though), but gcc seems to be quite a bit better at handling integer-only stuff. I'm working on a Mahjong game right now, and the hand evaluator is ~15% faster when built with g++ -O3 compared to clang++ -O3.

**carewolf** · 14 September 2016, 09:59 AM

Originally posted by mlau View Post

me too, actually, although it shows that more optimization passes don't come for free.

And the ones that give the final percentages are the most expensive. Getting 80-90% there is fast.. Which is why we have -O2 and -O1

**mlau** · 14 September 2016, 10:08 AM

Originally posted by s_j_newbury View Post

Can you look at the generated code in each case and work out why?

gcc does: 1x vfmadd while clang does: 4x vmul+vadd in an iteration.
gcc's version is only half the size, but clangs is apparently 4x as fast.

**strtj** · 14 September 2016, 10:41 AM

Originally posted by mlau View Post

Oh I agree that it's a almost meaningless benchmark, but it at least shows where both compilers excel at code generation.
I ran the scimark2 suite with clang HEAD from a few hours ago, and it absolutely destroys gcc in the sparse matmult benchmark:

Sparse matmult Mflops: 12550.17 (N=1000, nz=5000) (clang git head)
Sparse matmult Mflops: 3118.18 (N=1000, nz=5000) (gcc-6.2)

There's no meaningful difference in the other scores.

I have seen significant performance gains on a number of synthetic benchmarks with clang git (4.0) when -ffast-math is enabled. -flto can also help significantly on clang, whereas on gcc I rarely see it have a significant impact. I believe with -ffast-math clang is willing to do some vectorization that is not considered safe by default.

**strtj** · 14 September 2016, 10:49 AM

I would be interested to see icc included in these benchmarks as a sort of "reference" compiler. It virtually always produces faster code than gcc or clang, and shows what the hardware is capable of. I would consider "% of Intel Compiler Performance" a reasonable metric for evaluating the speed of code produced by any open source compiler.

**tajjada** · 14 September 2016, 11:52 AM

I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.

**Marc Driftmeyer** · 14 September 2016, 12:16 PM

Originally posted by tajjada View Post

I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.

Wow. A litany of unsubstantiated proclamations must make you feel self-important.

**kreijack** · 14 September 2016, 04:27 PM

Originally posted by mlau View Post

I did, and there's no difference in generated code at all, with both gcc and clang.

I checked it with both gcc-6.2 and clang-4: what reported by 'discordian' is true; gcc optimize the second call to sin():

Code:

$ cat test.c
#include <math.h>

double test(double x, double y)
{

        double a = sin(x); /* this could set errno */
        double b = log(y); /* this could set errno */
        double c = sin(x) + a; /* this could set errno */

        return a*b*c;
}

Code:

$ gcc-6  -O2 -g -Wall -pedantic  -c test.c  && objdump -Sr test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <test>:
#include <math.h>

double test(double x, double y)
{
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   f2 0f 11 4c 24 08       movsd  %xmm1,0x8(%rsp)

        double a = sin(x); /* this could set errno */
   a:   e8 00 00 00 00          callq  f <test+0xf>
                        b: R_X86_64_PC32        sin-0x4
        double b = log(y); /* this could set errno */
   f:   f2 0f 10 4c 24 08       movsd  0x8(%rsp),%xmm1
        double a = sin(x); /* this could set errno */
  15:   f2 0f 11 04 24          movsd  %xmm0,(%rsp)
        double b = log(y); /* this could set errno */
  1a:   66 0f 28 c1             movapd %xmm1,%xmm0
  1e:   e8 00 00 00 00          callq  23 <test+0x23>
                        1f: R_X86_64_PC32       log-0x4
        double c = sin(x) + a; /* this could set errno */
  23:   f2 0f 10 14 24          movsd  (%rsp),%xmm2

        return a*b*c;
}
  28:   48 83 c4 18             add    $0x18,%rsp
        double c = sin(x) + a; /* this could set errno */
  2c:   66 0f 28 ca             movapd %xmm2,%xmm1
  30:   f2 0f 58 ca             addsd  %xmm2,%xmm1
        return a*b*c;
  34:   f2 0f 59 d0             mulsd  %xmm0,%xmm2
  38:   f2 0f 59 ca             mulsd  %xmm2,%xmm1
  3c:   66 0f 28 c1             movapd %xmm1,%xmm0
}
  40:   c3                      retq

Code:

$ clang-4.0  -O2 -g -Wall -pedantic  -c test.c  && objdump -Sr test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <test>:
#include <math.h>

double test(double x, double y)
{
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   f2 0f 11 0c 24          movsd  %xmm1,(%rsp)

        double a = sin(x); /* this could set errno */
   9:   f2 0f 11 44 24 08       movsd  %xmm0,0x8(%rsp)
   f:   e8 00 00 00 00          callq  14 <test+0x14>
                        10: R_X86_64_PC32       sin-0x4
  14:   f2 0f 11 44 24 10       movsd  %xmm0,0x10(%rsp)
        double b = log(y); /* this could set errno */
  1a:   f2 0f 10 04 24          movsd  (%rsp),%xmm0
  1f:   e8 00 00 00 00          callq  24 <test+0x24>
                        20: R_X86_64_PC32       log-0x4
  24:   f2 0f 11 04 24          movsd  %xmm0,(%rsp)
        double c = sin(x) + a; /* this could set errno */
  29:   f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
  2f:   e8 00 00 00 00          callq  34 <test+0x34>
                        30: R_X86_64_PC32       sin-0x4
  34:   f2 0f 10 4c 24 10       movsd  0x10(%rsp),%xmm1
  3a:   f2 0f 58 c1             addsd  %xmm1,%xmm0
  3e:   f2 0f 10 14 24          movsd  (%rsp),%xmm2

        return a*b*c;
  43:   f2 0f 59 d1             mulsd  %xmm1,%xmm2
  47:   f2 0f 59 d0             mulsd  %xmm0,%xmm2
  4b:   66 0f 28 c2             movapd %xmm2,%xmm0
  4f:   48 83 c4 18             add    $0x18,%rsp
  53:   c3                      retq

**carewolf** · 14 September 2016, 07:29 PM

Originally posted by tajjada View Post

I wonder why nobody else seems to be talking about this, but I can't help but notice all the performance regressions in 3.9 compared to 3.8.

I understand when a new compiler version might produce slightly slower code in a couple of benchmarks here and there (and, if performance has improved in general (net gain), it is justified), but in this case, it seems like the *majority* of the benchmarks have regressed. There are very few benchmarks in this article in which 3.9 is faster, and even then, just slightly. On most of them, it is worse than 3.8, sometimes by a lot. This is *not* acceptable IMO. I wonder what is causing this.

Furthermore, according to the compile times benchmark, compilation times are much slower with 3.9, too (more in line with GCC), and that used to be one of clang's biggest selling points. These benchmarks, along with the fact that GCC has significantly improved error/warning messages and diagnostics, make virtually all of the advantages that clang used to have, obsolete. Clang seems fairly useless now.

I think everybody who read it noticed it, but the "regressions" were generally minor, and it is very hard to say how representative this set of selected benchmarks are, and I know LLVM people read the site, so if there is something worth looking into and fixing, I am sure they will.

**Ansla** · 16 September 2016, 06:18 AM

Originally posted by discordian View Post

It would be interesting to run these tests on clang with the -fno-math-errno flag.
clang prohibits several optimization by default (which could result in visible differences), whereas gcc doesnt.

Code:

double a = sin(x); /* this could set errno */
double b = log(y); /* this could set errno */
double c = sin(x) + a; /* this could set errno */

gcc will only call sin(x) once and double the value, clang will not change the order since the functions could modify global state (errno) and reordering them could break programs that depend on this behavior.

Whetstone for example is notoriously affected by such flags.

I don't see the problem with reordering in the above scenario as long as there is no code that reads errno between those calls and as long as the last executed call is sin(x).

Announcement

LLVM Clang 3.9 Mostly Trails GCC In Compiler Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment