Announcement

Collapse
No announcement yet.

The Performance Between GCC Optimization Levels

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Performance Between GCC Optimization Levels

    Phoronix: The Performance Between GCC Optimization Levels

    For those that have never benchmarked the performance differences between GCC's different optimization levels, here are some recent test results comparing the performance differences when using an AMD FX-8150 processor with GCC 4.7.2.

    http://www.phoronix.com/vr.php?view=17986

  • #2
    When it's a bit more stable, I would like to see some benchmarks with the new -Og option in the 4.8 dev series.

    Comment


    • #3
      These are a big improvement on previous benchmarks done by Phoronix. They would be better if tests that set -march=native tests in addition to the optimization level settings were also done.

      It would be useful if we could see the effect that cache has on optimization levels. Small caches are generally thought to favor lower optimization levels. In particular, -Os and -O2.

      Comment


      • #4
        Originally posted by ryao View Post
        These are a big improvement on previous benchmarks done by Phoronix. They would be better if tests that set -march=native tests in addition to the optimization level settings were also done.

        It would be useful if we could see the effect that cache has on optimization levels. Small caches are generally thought to favor lower optimization levels. In particular, -Os and -O2.
        I agree, especially w/ gentoo. GCC & binutils need to be optimized and errata pulled out to see the true benefits.

        Comment


        • #5
          Indeed, some optimizations will work better in combination with -march settings. The cache issue will show in benchmarks with highly parallelized workloads (such as web serving or databases with many clients).

          Regarding the article, it is interesting how the selection of benchmarks emphasizes floating-point heavy code, since this is what benefits a lot from -O3 (and -Ofast, but this may cause calculations result to be different from what you expect).

          For some historic reference, Linux Magazine already ran a comparison of -Os, -O2 and -O3 on Gentoo vs. Ubuntu a while back: http://www.linux-mag.com/id/7574/

          Comment


          • #6
            -O2 and -O3 can actually produce massive binary output size increases for not much gain over -Os.

            When you compile Mozilla software with -O3, you will get much larger binary size, which actually can make it take longer to load, and make the resulting program take up more space in RAM. I think Mozilla recommends -O2, but I've seen where some distributions use -Os, which doesn't make the binaries much smaller, but can hurt Firefox's score on things like Sunspider or Google's V8 benchmark. (-O3 doesn't help it enough to be worth the cost in load times and additional RAM usage)

            Obviously some things benefit so much from -O3 that it becomes worth the tradeoff in longer load times and higher RAM consumption. You can't take that for granted, though.

            Yes, there is such a thing as being too aggressive with optimization level. Unfortunately, it's hard to always know when you've gone too far because it varies from program to program. Just use Fedora and be happy. They usually do OK with things like this.

            Comment


            • #7
              Originally posted by DaemonFC View Post
              -O2 and -O3 can actually produce massive binary output size increases for not much gain over -Os.

              When you compile Mozilla software with -O3, you will get much larger binary size, which actually can make it take longer to load, and make the resulting program take up more space in RAM. I think Mozilla recommends -O2, but I've seen where some distributions use -Os, which doesn't make the binaries much smaller, but can hurt Firefox's score on things like Sunspider or Google's V8 benchmark. (-O3 doesn't help it enough to be worth the cost in load times and additional RAM usage)

              Obviously some things benefit so much from -O3 that it becomes worth the tradeoff in longer load times and higher RAM consumption. You can't take that for granted, though.

              Yes, there is such a thing as being too aggressive with optimization level. Unfortunately, it's hard to always know when you've gone too far because it varies from program to program. Just use Fedora and be happy. They usually do OK with things like this.
              -O2 -march=native is generally considered to be optimal outside of special cases.

              Comment


              • #8
                the GCC4.7 optimisation guide specifically says that using -O3 is not recommended over -O2. And that O3 was faster 'in the past' , but is now not faster than -O2.

                Is it OK to use -O3 to build the linux kernel ?

                Comment


                • #9
                  Originally posted by mayankleoboy1 View Post
                  the GCC4.7 optimisation guide specifically says that using -O3 is not recommended over -O2. And that O3 was faster 'in the past' , but is now not faster than -O2.

                  Is it OK to use -O3 to build the linux kernel ?
                  scriptkernel-x.x.x.sh= BFS + BFQ + CFLAG -Ofast

                  http://sourceforge.net/projects/scriptkernel/files/

                  scriptgcc-4.7.2_UBUNTU12_64BITS.sh = script compile source code gcc-4.7.2 automatic then ubuntu 12.04+

                  http://sourceforge.net/projects/scri...TS.sh/download


                  ...

                  Comment


                  • #10
                    Originally posted by mayankleoboy1 View Post
                    the GCC4.7 optimisation guide specifically says that using -O3 is not recommended over -O2. And that O3 was faster 'in the past' , but is now not faster than -O2.

                    Is it OK to use -O3 to build the linux kernel ?
                    scriptkernel-x.x.x.sh= BFS + BFQ + CFLAG -march=native -Ofast

                    http://sourceforge.net/projects/scriptkernel/files/

                    scriptgcc-4.7.2_UBUNTU12_64BITS.sh = script compile source code gcc-4.7.2 automatic then ubuntu 12.04+

                    http://sourceforge.net/projects/scri...TS.sh/download


                    ...

                    Comment


                    • #11
                      @Michael,

                      Hmmm fairly interesting benchmarks - but a bit predicable in the outcomes... Although I had heard that Os only took a 10% hit...

                      I was hoping that you would have tested more esoteric stuff like the so called "Graphite" optimisations ( -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block ). Have always been too scared to try these on any applications on my Gentoo install (they are commented out in make.conf)

                      I have used the lto (delayed link time optimisations) with gcc 4.7.1/2 - while I have a list of stuff that falls back to no-lto, it's not unmanageable. Naturally doesn't appear to make much difference with day-to-day usage

                      Bob

                      Comment


                      • #12
                        Originally posted by mayankleoboy1 View Post
                        Is it OK to use -O3 to build the linux kernel ?
                        I doubt a combination of carefully written C and handcrafted assembler is going to benefit very much from additional pseudo-smart Compiler heuristics...

                        Bob

                        Comment


                        • #13
                          Gcc vs llvm in Os, and mostly into compilation time can be very very interesting.
                          Developer of Ultracopier/Supercopier and of the game CatchChallenger

                          Comment


                          • #14
                            Good benchmarks!

                            Could you please include binary size too, if you do more like this? Also, I'm guessing -Ofast doesn't work everywhere hence no result in the PHP benchmark?

                            Comment


                            • #15
                              Originally posted by ryao View Post
                              It would be useful if we could see the effect that cache has on optimization levels. Small caches are generally thought to favor lower optimization levels. In particular, -Os and -O2.
                              Only when the compiler heuristics fail, there's nothing that says -O3 'has' to use all available optimizations and thus bloat code resulting in cache trashing and a possible net performance loss.

                              Originally posted by ryao View Post
                              -O2 -march=native is generally considered to be optimal outside of special cases.
                              Not 'optimal', rather the 'safe' choice as some of the more aggressive optimization enabled at -O3 and above which can yield great performance increases can also backfire due to the difficulty of gauging their effectiveness in relation to cost at compile time.

                              However there is a solution to this problem, profile-guided optimization. Of all the tests I've done over the past two years I can't recall one situation where -O3 with PGO did not outperform or in the worst case scenario match any of the lower optimization levels.

                              Obviously this is because the profile data gives the compiler runtime information (hot/cold codepaths, cache usage, loop iterations etc) from which to determine when and where to apply optimizations which is a huge benefit compared to making 'educated guesses' at compile time.

                              Comment

                              Working...
                              X