Announcement

Collapse
No announcement yet.

GCC 6.1 vs. LLVM Clang 3.9 Compiler Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 6.1 vs. LLVM Clang 3.9 Compiler Performance

    Phoronix: GCC 6.1 vs. LLVM Clang 3.9 Compiler Performance

    After carrying out the recent GCC 4.9 vs. 5.3 vs. 6.1 compiler benchmarks for looking at the GNU Compiler Collection performance over the past three years on the same Linux x86_64 system, I then loaded up a development snapshot of the LLVM 3.9 SVN compiler to see how these two dominant compilers are competing on the performance front for C/C++ programs.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    i will keep using clang as my compiler

    waiting for zapcc to become open source

    Comment


    • #3
      Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.

      Comment


      • #4
        Originally posted by chrisb View Post
        Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.

        I was recently working on an optimised math (matrix+vector) library for OpenGL.

        There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

        I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

        Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).

        Comment


        • #5
          Originally posted by tajjada View Post


          I was recently working on an optimised math (matrix+vector) library for OpenGL.

          There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

          I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

          Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).
          Did you report the issue with GCC upstream? I'm sure the information you have to provide would be helpful for them.

          Comment


          • #6
            Originally posted by tajjada View Post


            I was recently working on an optimised math (matrix+vector) library for OpenGL.

            There is a lot of code that GCC simply failed to optimise, which IMO should have been fairly trivial. I am talking about loops that iterate over fixed-length arrays and do simple arithmetic on each element.

            I looked at the disassembly for each function, and wrote tests and benchmarks to verify the correctness and performance/speed of each function in my library.

            Even with -Ofast, gcc unrolled the loops, but failed to vectorise the code, even though vectorisation options were clearly enabled. It vectorised a few things here and there, but simply failed to detect most of them, even though it should have been trivial IMO. It did the operations on each element, one by one. My manually-optimised SSE/AVX versions were often 3-5 times faster, and I am not even finished optimising this stuff (I have ideas for more I can do).
            You may consider turning those into stand alone testcases and filling in GCC bug (enhancement request)

            Comment


            • #7
              I ran across an interesting project that shows the assembly output of C++ code you write in-browser for different compiler versions, flags, etc.
              I was really surprised how GCC would not optimize virtual methods that could have otherwise been inlined. Clang had no problem.
              After finding that out I had to redesign the classes using templates... something easier for GCC to understand.
              I am was really surprised how well Clang actually did against what I thought was suppose to be a well optimizing compiler, GCC.

              Comment


              • #8
                To be blunt, Clang is better.
                Compiles faster and generally results in faster code.
                Produces more warnings about bad code.
                And both are free to use.

                Comment


                • #9
                  Originally posted by bpetty View Post
                  I ran across an interesting project that shows the assembly output of C++ code you write in-browser for different compiler versions, flags, etc.
                  I was really surprised how GCC would not optimize virtual methods that could have otherwise been inlined. Clang had no problem.
                  After finding that out I had to redesign the classes using templates... something easier for GCC to understand.
                  I am was really surprised how well Clang actually did against what I thought was suppose to be a well optimizing compiler, GCC.
                  Filling in enhancement requests for GCC would help. GCC has quite involved devirtualization infrastructure https://hubicka.blogspot.cz/2014/09/...enforcing.html
                  and I am not aware of testcases where clang would devirtualize and GCC would not, so I would be curious to see them.

                  Comment


                  • #10
                    Originally posted by chrisb View Post
                    Surprising that there are still benchmarks where one compiler can more than double the performance of the other. I wonder how much this is down to compiling slightly different variants of the code eg enabling SSE or other specialised instruction sets.
                    This is quite common to see such changes in micro-benchmarks with small internal loop and data set. For example, x86 chips are very sensitive to code layout with their decoder throughput. Scimark in particular is very small bechmark and thus it all fits in cache on modern CPUs (it is not very serious benchmark for C compilers and was developed to track Java JIT implementation http://math.nist.gov/scimark2/).

                    Neither GCC or LLVM models closely the decoder pipeline and other architectural details that plays the role here. Consequently it more or less depend on the luck what the final performance is. I tried to reproduce and analyze the scimark results reported here few times in past and it really depends on particular setup. Often I get completely oposite scores.

                    One reproducible issue found so far is tracked in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

                    Comment

                    Working...
                    X