Announcement

Collapse
No announcement yet.

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    Phoronix: GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    With the mention earlier this week of GCC potentially enabling the vectorize options at the -O2 optimization level, I carried out some fresh GCC 9 development benchmarks at various optimization levels for reference.

    http://www.phoronix.com/vr.php?view=27392

  • #2
    The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.

    Comment


    • #3
      Originally posted by tichun
      What about -Os?
      As for x86 desktop CPUs with a lot of cache the code generated with -Os is almost always significantly slower than the one compiled with -O2. Back in Pentium 3 days and earlier I used -Os when RAM was limited and expensive. Nowadays, this option makes sense only for embedded/memory constrained devices.

      Here are some recent -O2 vs -Os results: https://rv8.io/bench

      And here's GCC developers attitude towards using -Os:
      Code:
      First let me put into some perspective on -Os usage and some history:
      1) -Os is not useful for non-embedded users
      2) the embedded folks really need the smallest code possible and
      usually will be willing to afford the performance hit
      3) -Os was a mistake for Apple to use in the first place; they used it
      and then GCC got better for PowerPC to use the string instructions
      which is why -Oz was added :)
      4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.

      Michael ran -Os tests not so long ago.
      Last edited by birdie; 12 January 2019, 12:07 PM.

      Comment


      • #4
        This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

        Comment


        • #5
          Originally posted by CochainComplex View Post
          This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
          -O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.

          For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.

          It looks like there are too many theorists on Phoronix who've never compiled anything.

          Comment


          • #6
            Originally posted by CochainComplex View Post
            This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
            Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

            It tells the compiler to use instruction and optimization presets for the local build machine.
            Last edited by wagaf; 12 January 2019, 12:15 PM.

            Comment


            • #7
              Originally posted by wagaf View Post

              Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

              It tells the compiler to use instruction and optimization presets for the local build machine.
              Indeed though distros could use FMV like Clear Linux does to offer optimal code path for CPU at run-time.
              Michael Larabel
              http://www.michaellarabel.com/

              Comment


              • #8
                Originally posted by birdie View Post

                -O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.
                I've not recently encountered a case where O3 was running significantly slower than O2 or was causing any issue that was not a bug in the program (notice no regressions with O3 in the benchmarks, only performance improvements).

                Looking at the generated machine code I noticed GCC improved in the last few years and is now closer to Clang for generating "clean" code.

                I now compile all my code with O3 for release builds.

                Originally posted by birdie View Post
                For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.
                Mostly because x264 has heavy assembly optimizations. But not a reason to not use O3 right ?

                Comment


                • #9
                  Originally posted by CochainComplex View Post
                  This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
                  ... and to add to what others said, `-march=native` implies `-mtune=native`, which will tune the program to run faster on your CPU (without implying newer CPU features are used on its own). As such compiling with `-march=native` will be not only incompatible with all CPUs that do not support exactly your CPU's instruction set *and* it will be less optimized for other machines too.
                  Last edited by AsuMagic; 13 January 2019, 06:21 AM. Reason: typo

                  Comment


                  • #10
                    Originally posted by wagaf View Post

                    I've not recently encountered a case where O3 was running significantly slower than O2 or was causing any issue that was not a bug in the program (notice no regressions with O3 in the benchmarks, only performance improvements).

                    Looking at the generated machine code I noticed GCC improved in the last few years and is now closer to Clang for generating "clean" code.

                    I now compile all my code with O3 for release builds.
                    I'm pretty sure you haven't actually tested more than a couple of applications with -O3 vs -O2 which renders your statement kinda superficial and overly-optimistic.

                    Comment

                    Working...
                    X