Announcement

**davidbepo** · 10 May 2016, 03:30 PM

i will keep using clang as my compiler

waiting for zapcc to become open source

**chrisb** · 10 May 2016, 05:28 PM

Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.

**tajjada** · 10 May 2016, 06:02 PM

Originally posted by chrisb View Post

Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.

I was recently working on an optimised math (matrix+vector) library for OpenGL.

There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).

**mmstick** · 10 May 2016, 08:59 PM

Originally posted by tajjada View Post

I was recently working on an optimised math (matrix+vector) library for OpenGL.

There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).

Did you report the issue with GCC upstream? I'm sure the information you have to provide would be helpful for them.

**hubicka** · 10 May 2016, 09:48 PM

Originally posted by tajjada View Post

I was recently working on an optimised math (matrix+vector) library for OpenGL.

There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).

You may consider turning those into stand alone testcases and filling in GCC bug (enhancement request)

**bpetty** · 11 May 2016, 09:37 AM

I ran across an interesting project that shows the assembly output of C++ code you write in-browser for different compiler versions, flags, etc.
I was really surprised how GCC would not optimize virtual methods that could have otherwise been inlined. Clang had no problem.
After finding that out I had to redesign the classes using templates... something easier for GCC to understand.
I am was really surprised how well Clang actually did against what I thought was suppose to be a well optimizing compiler, GCC.

**ua=42** · 12 May 2016, 10:13 AM

To be blunt, Clang is better.
Compiles faster and generally results in faster code.
Produces more warnings about bad code.
And both are free to use.

**hubicka** · 13 May 2016, 10:33 AM

Originally posted by bpetty View Post

I ran across an interesting project that shows the assembly output of C++ code you write in-browser for different compiler versions, flags, etc.
I was really surprised how GCC would not optimize virtual methods that could have otherwise been inlined. Clang had no problem.
After finding that out I had to redesign the classes using templates... something easier for GCC to understand.
I am was really surprised how well Clang actually did against what I thought was suppose to be a well optimizing compiler, GCC.

Filling in enhancement requests for GCC would help. GCC has quite involved devirtualization infrastructure https://hubicka.blogspot.cz/2014/09/...enforcing.html
and I am not aware of testcases where clang would devirtualize and GCC would not, so I would be curious to see them.

**hubicka** · 13 May 2016, 10:43 AM

Originally posted by chrisb View Post

Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.

This is quite common to see such changes in micro-benchmarks with small internal loop and data set. For example, x86 chips are very sensitive to code layout with their decoder throughput. Scimark in particular is very small bechmark and thus it all fits in cache on modern CPUs (it is not very serious benchmark for C compilers and was developed to track Java JIT implementation http://math.nist.gov/scimark2/).

Neither GCC or LLVM models closely the decoder pipeline and other architectural details that plays the role here. Consequently it more or less depend on the luck what the final performance is. I tried to reproduce and analyze the scimark results reported here few times in past and it really depends on particular setup. Often I get completely oposite scores.

One reproducible issue found so far is tracked in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

Announcement

GCC 6.1 vs. LLVM Clang 3.9 Compiler Performance

GCC 6.1 vs. LLVM Clang 3.9 Compiler Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment