Announcement

Collapse
No announcement yet.

CompilerDeathMatch: surprising results

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    @staalmannen
    Since you'll be running more tests, could you add -Os (optimise for size) to the gcc optimisation options tested? Also for icc, clang and pcc if they have a similar option.

    It is well known that -O3 leads to better performance than -O2 only in very specific cases. The reason is partly because -O3 binaries are larger and that makes me suspect that -Os should perform better than -O2 in some cases.

    Comment


    • #17
      @staalmannen

      Nice results. Thx for your sharing them. I usually use ICC to compile mplayer achieving around 10% more speed over GCC. If you don't mind could you try those flags on ICC:

      -xSSSE3 -fast -fp-model fast=1 -unroll-aggressive

      -xSSSE3: sets your processor type to core 2
      -fast: enables the major speed optimizations options: -ip -O3 -static
      -unroll-aggressive: unroll loops
      -fp-model fast=1: implements foating points optimization. (-fp-model fast=2 implements more floating points optimizations but less acurate results)

      Comment


      • #18
        Originally posted by Mo6eB View Post
        It is well known that -O3 leads to better performance than -O2 only in very specific cases. The reason is partly because -O3 binaries are larger and that makes me suspect that -Os should perform better than -O2 in some cases.
        It's true and some plp have already measured this. As -Os produce small excutables your CPU not waste much time moving data around cache, and in some cases this performs better than -O2 and -O3 optimizations, this is even more important on CPUs with small caches. Some kernels devs recomend -Os flag to compile the kernel.

        Comment


        • #19
          Originally posted by Jimbo View Post
          @staalmannen

          Nice results. Thx for your sharing them. I usually use ICC to compile mplayer achieving around 10% more speed over GCC. If you don't mind could you try those flags on ICC:

          -xSSSE3 -fast -fp-model fast=1 -unroll-aggressive

          -xSSSE3: sets your processor type to core 2
          -fast: enables the major speed optimizations options: -ip -O3 -static
          -unroll-aggressive: unroll loops
          -fp-model fast=1: implements foating points optimization. (-fp-model fast=2 implements more floating points optimizations but less acurate results)
          Sure I will try that after I have tried -O2 and -O3 for Clang and Open64, along with the Os-tests for the 4 compilers supporting it (ICC, GCC, Clang, Open64).

          If anyone knows what flags are recommended for tcc and pcc I am all ears.

          In addition, if anyone knows how to "unclutter" a big result file on phoronix global -that would be appreciated.

          I still want all data in one graph since that actually gives additional value (comparisons between compilers X different optimization levels).

          One pattern that seems to be emerging, for example, is that compile time is not inversely related to optimized final binaries (which often is assumed in interpretations of compiler comparisons).
          Unfortunately binary size is not part of the current compiler benchmark suite. It would have been nice if the suite stored binary sizes for each compilation...

          Comment


          • #20
            a good choice of -march might be 'native'. also see http://en.gentoo-wiki.com/wiki/Safe_Cflags

            in my own tests with a fortran simulation code O3 beats Os (though this is probably not generally true)
            http://www.hep.man.ac.uk/u/sam/zgoubi-optimise/oberon/

            with GCC you might want to look at lto. O3 + lto can make smaller binaries than Os
            http://gcc.gnu.org/wiki/summit2010?a...et=hubicka.pdf

            Also i remember reading an article about how big caches and clever precaching on modern CPUs meant that O3 was better than Os now. i think it was a report by intel. but i can't find it.

            Comment


            • #21
              Originally posted by ssam View Post
              a good choice of -march might be 'native'. also see http://en.gentoo-wiki.com/wiki/Safe_Cflags

              in my own tests with a fortran simulation code O3 beats Os (though this is probably not generally true)
              http://www.hep.man.ac.uk/u/sam/zgoubi-optimise/oberon/

              with GCC you might want to look at lto. O3 + lto can make smaller binaries than Os
              http://gcc.gnu.org/wiki/summit2010?a...et=hubicka.pdf

              Also i remember reading an article about how big caches and clever precaching on modern CPUs meant that O3 was better than Os now. i think it was a report by intel. but i can't find it.
              "native" should be the same as "core 2" for my hardware. I might include LTO last for GCC along with the ICC-specific settings suggested above for ICC. First going to run through the rest of the compilers though... it takes a long time .

              Comment


              • #22
                Clang just got back in the game - and then some!

                http://global.phoronix-test-suite.co...76-14957-29367

                I have not merged this result yet since the computer crached during imagemagic compilation (probably overheating), which meant that I was only able to run 1 test this night.

                side-by-side comparisons do however indicate that Clang performed as good or better than ICC or GCC when -O2 is used!

                Pretty cool...

                Comment


                • #23
                  Updated merge: Clang and Open64

                  Here is an update with -O2 and -O3 for Clang and Open64.
                  We had our annual lab cleaning today so I could led my computer chew away on some numbers while I was busy with other stuff

                  http://global.phoronix-test-suite.co...555-17359-8826

                  TODO:
                  I think we are approaching saturation of this dataset, which will lead to a final result being announced (and I can finally start updating my Arch install )

                  * Upon request:
                  - Include Os flags - works for ICC, GCC, Clang and Open64
                  - Include special settings: specific requests for ICC and GCC (LTO) currently.
                  - any other requests before the 64-bit tests are concluded to be finished?

                  * Stuff I am interested in:
                  - Check whether PCC can be tweaked for performance. I had an E-mail conversation with the current mantainer (Anders Magnusson) for possible flags.
                  - anyone an expert on TCC?

                  @Michael: Feel free to use these results for a Phoronix article if you want.
                  I think we are close to reaching saturation with this test set, which means that the only expansions that can be made are 1) different hardware, 2) more compilers.

                  When this series is concluded, I will start playing with 32-bit, where a number of other compilers are available in addition to those tested here (LCC, ACK, KenCC, SolarisStudio, OpenWatcom...)

                  Comment


                  • #24
                    isn't
                    Code:
                    -march=core2 -mtune=generic
                    a bit odd. march limits what instructions can be used, and mtune tweaks smaller things like instruction order and optimisation for cache sizes. so:
                    Code:
                    -march=i686 -mtune=core2
                    will optimise for core2, but without doing anything that stop the program working on an older chip. eg fedora 32bit uses:
                    Code:
                    -march=i686 -mtune=atom
                    if you use -march=core2, then it won't run on anything older, so i doubt that the -mtune=generic would do anything useful. http://gcc.gnu.org/onlinedocs/gcc/i3...4-Options.html

                    also, doesn't ICC have the equivalent of ffast-math on by default?

                    Comment


                    • #25
                      Interesting and somewhat weird results here and there. I would say that as much as possible it's important to use the same flags across compilers, else the results will quickly become meaningless UNLESS you handtune the best settings for each compiler which is pretty difficult. Yes, sometimes -O2 generates faster code than -O3, but -O3 is supposed to generate the fastest code so having all compilers use that (or whatever goes as -O3 for them) would make most sense imo. Also it's a good thing to explicitly specify other things like -ffast-math since as ssam mentioned some compilers defaults to that which can make a big difference in many benchmarks.

                      Also, I think it would be best to either stick to -march=native or specify the exact system used -march=<system>, do not bother with -mtune.

                      Interesting seeing clang compiling p7zip, last time I checked it failed, time build a new version from svn.

                      Comment


                      • #26
                        Originally posted by staalmannen View Post
                        When this series is concluded, I will start playing with 32-bit, where a number of other compilers are available in addition to those tested here (LCC, ACK, KenCC, SolarisStudio, OpenWatcom...)
                        I'd be interested kind of interested to see how DMC stacks up these days. (Does it build on Linux? I know DMD does, but....)

                        Comment


                        • #27
                          Originally posted by Wyatt View Post
                          I'd be interested kind of interested to see how DMC stacks up these days. (Does it build on Linux? I know DMD does, but....)
                          I checked the webpage for a linux version of dmc but did not find one. Perhaps it can compile with winelibs, but I did not find anyone trying.

                          @ the rest suggesting new flags: Thanks. I will take those points into account during the -Os rounds and later for 32-bit tests. If they make a big impact I might repeat some other tests with the new flags. I think, however, that really compiler-specific tweaks are a bit out of the scope of a very broad investigation like this and would probably be more interesting when specifically comparing two compilers, like ICC vs GCC for example.

                          Comment


                          • #28
                            I know it's a bit late now, but my suggestion would be to ditch C-Ray as it's just a straight line across all compilers anyway. POV-Ray would have given more interesting results probably.

                            Comment


                            • #29
                              Originally posted by devius View Post
                              I know it's a bit late now, but my suggestion would be to ditch C-Ray as it's just a straight line across all compilers anyway. POV-Ray would have given more interesting results probably.
                              You are probably correct. I am just running the "compiler" suite at the moment for these tests. It has basically grown organically from my first announcement and swollen larger than I ever could have imagined.

                              All suggestions are welcome

                              I am currently looking forward to concluding this round of tests though so that I can start with 32-bit.

                              Under my previous TODO are the stuff that I have planned to check before concluding. If anyone got more suggestions, you better come up with them before I have run all the things on the current TODO and announce the test to be concluded/saturated (because after that I am updating my OS and subsequent analyses will not be comparable).

                              Comment


                              • #30
                                WOW! Clang is already side by side with gcc! Awesome post!

                                Comment

                                Working...
                                X