Announcement

Collapse
No announcement yet.

GCC 4.8.0 vs. LLVM Clang 3.3 Compiler Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 4.8.0 vs. LLVM Clang 3.3 Compiler Performance

    Phoronix: GCC 4.8.0 vs. LLVM Clang 3.3 Compiler Performance

    In preparation for the upcoming release of LLVM 3.3, here is an extensive round of C/C++ benchmarks from GCC 4.8.0, LLVM Clang 3.2, and LLVM Clang 3.3-rc1 to look at the Linux compiler performance. Benchmarks happened from three different systems bearing Intel Core i7 3960X, AMD FX-8350, and Intel Core i3 3217U processors for a diverse look at the performance.

    http://www.phoronix.com/vr.php?view=18737

  • #2
    Interesting, I remember the last time clang was tested using an AMD CPU the performance was abysmal compared to GCC.

    edit:
    and for being ~$800 cheaper, the FX-8350 performs pretty well in general.
    Last edited by peppercats; 05-25-2013, 01:42 AM.

    Comment


    • #3
      Originally posted by peppercats View Post
      the FX-8350 performs pretty well in general.
      I'm very surprised about FX perfomance in some tests.

      Clang: seems like next release will finally beat GCC, and be a good replacement for it (except OpenMP).

      Comment


      • #4
        Originally posted by leonmaxx View Post
        I'm very surprised about FX perfomance in some tests.

        Clang: seems like next release will finally beat GCC, and be a good replacement for it (except OpenMP).
        I'm not surprised at all, anything that is heavily parallel strongly favours the Bulldozer/Pilerdriver architecture, such that in some benchmarks on other sites it trounced the i7 3990X.

        Comment


        • #5
          Originally posted by Luke_Wolf View Post
          I'm not surprised at all, anything that is heavily parallel strongly favours the Bulldozer/Pilerdriver architecture, such that in some benchmarks on other sites it trounced the i7 3990X.
          Yeah, the issue is just finding the right benchmark to fully exploit the hardware. Your typical desktop workload is going to be a lot slower since it's mostly single threaded.

          Comment


          • #6
            Originally posted by smitty3268 View Post
            Yeah, the issue is just finding the right benchmark to fully exploit the hardware. Your typical desktop workload is going to be a lot slower since it's mostly single threaded.
            Well here's the thing... that's not really true for one simple reason a desktop workload unlike a benchmark (outside of gaming) doesn't consist of running a single application at a time, just an example from myself I've got firefox open with a number of tabs, my IDE, a konsole, my chat clients and dolphin, as well as stuff like dropbox running in the background. any one of these (ignoring the coding and thus compiling) is not particularly heavily threaded but since I'm running all these things I'm taking advantage of more cores. It's certainly not enough to push it up to beating a 3990X but that's besides the point.

            Comment


            • #7
              Originally posted by Luke_Wolf View Post
              Well here's the thing... that's not really true for one simple reason a desktop workload unlike a benchmark (outside of gaming) doesn't consist of running a single application at a time, just an example from myself I've got firefox open with a number of tabs, my IDE, a konsole, my chat clients and dolphin, as well as stuff like dropbox running in the background. any one of these (ignoring the coding and thus compiling) is not particularly heavily threaded but since I'm running all these things I'm taking advantage of more cores. It's certainly not enough to push it up to beating a 3990X but that's besides the point.
              All those apps are sitting there idle not doing a damn thing most of the time. You don't need much hardware for apps that block and wait for user input for 99% of their life. A single-core Atom can do everything you just listed without breaking a sweat.

              Comment


              • #8
                Glad to see he is now displaying full optimization flags, also glad that he is using -O3.

                Still don't understand why he persists in doing openmp benchmark comparisons as they are totally worthless until clang/llvm supports openmp.

                He finally acknowledges that testing stuff like ffmpeg yields no real difference as pretty much all performance critical code is in assembly, so why not just _disable_ assembly optimizations? It would definitely be interesting in seeing how these compilers compare when optimizing video compression oriented code, just compile and benchmark x264 with './configure --disable-asm'.

                Also had to laugh at the bias when describing the results, in the C-Ray test GCC4.8 was (by my quick estimate) ~10% faster on Intel Core 3960X, ~40% faster on AMD FX-8350 and ~10% faster on Intel Core i3, this result is described by Michael as a 'slight performance edge'.

                10%-40% is not a slight performance edge in compiler optimization.

                Comment


                • #9
                  Originally posted by elanthis View Post
                  All those apps are sitting there idle not doing a damn thing most of the time. You don't need much hardware for apps that block and wait for user input for 99% of their life. A single-core Atom can do everything you just listed without breaking a sweat.
                  Stop internet pollution.

                  Comment


                  • #10
                    Originally posted by Luke_Wolf View Post
                    Well here's the thing... that's not really true for one simple reason a desktop workload unlike a benchmark (outside of gaming) doesn't consist of running a single application at a time, just an example from myself I've got firefox open with a number of tabs, my IDE, a konsole, my chat clients and dolphin, as well as stuff like dropbox running in the background. any one of these (ignoring the coding and thus compiling) is not particularly heavily threaded but since I'm running all these things I'm taking advantage of more cores. It's certainly not enough to push it up to beating a 3990X but that's besides the point.
                    Except on a desktop workload, all those background apps are usually idle (at least enough that all of them put together can run on a single core without problems), and you're just waiting on whatever you have active at the moment. Which is usually single threaded.

                    A lot of times, even stuff that is heavily threaded, like a game, will still be limited by the speed of a single thread, because the work they do on the background threads is relatively limited compared to the main one.

                    You need to find something that truly breaks up equal amounts of work on 8+ threads to make the AMD chips look good right now.
                    Last edited by smitty3268; 05-25-2013, 03:48 PM.

                    Comment


                    • #11
                      The openmp comparisons do have some merit. For example graphicsmagick here, it is less than 2x on the 8-core bulldozer: bad quality parallelization there.

                      Comment


                      • #12
                        Originally posted by curaga View Post
                        The openmp comparisons do have some merit. For example graphicsmagick here, it is less than 2x on the 8-core bulldozer: bad quality parallelization there.
                        Nope, this means that clang compiled single thread perfomance equals 2-4 gcc threads.

                        Comment


                        • #13
                          That is another possible conclusion, yes.

                          Comment


                          • #14
                            Originally posted by leonmaxx View Post
                            I'm very surprised about FX perfomance in some tests.

                            Clang: seems like next release will finally beat GCC, and be a good replacement for it (except OpenMP).
                            Except that it only was faster in two tests (in one of them only on one single CPU out of 3) and slower in everything else… (I don’t count compile times as tests).

                            But you summarized nicely how misleading the text of this article is - again

                            Comment


                            • #15
                              Originally posted by leonmaxx View Post
                              Nope, this means that clang compiled single thread perfomance equals 2-4 gcc threads.
                              Nope: You cannot distinguish between a weakly parallelizable algorithm and compiler performance. To get good data, you would also have to provide a GCC run without openmp: That would show the speedup due to OpenMP.

                              Comment

                              Working...
                              X