Announcement

Collapse
No announcement yet.

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    Phoronix: GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    With the mention earlier this week of GCC potentially enabling the vectorize options at the -O2 optimization level, I carried out some fresh GCC 9 development benchmarks at various optimization levels for reference.

    http://www.phoronix.com/vr.php?view=27392

  • #2
    The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.

    Comment


    • #3
      Was there a reason -Ofast wasn't included? Great article!

      Comment


      • #4
        What about -Os?

        Comment


        • #5
          Originally posted by tichun View Post
          What about -Os?
          As for x86 desktop CPUs with a lot of cache the code generated with -Os is almost always significantly slower than the one compiled with -O2. Back in Pentium 3 days and earlier I used -Os when RAM was limited and expensive. Nowadays, this option makes sense only for embedded/memory constrained devices.

          Here are some recent -O2 vs -Os results: https://rv8.io/bench

          And here's GCC developers attitude towards using -Os:
          Code:
          First let me put into some perspective on -Os usage and some history:
          1) -Os is not useful for non-embedded users
          2) the embedded folks really need the smallest code possible and
          usually will be willing to afford the performance hit
          3) -Os was a mistake for Apple to use in the first place; they used it
          and then GCC got better for PowerPC to use the string instructions
          which is why -Oz was added :)
          4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.

          Michael ran -Os tests not so long ago.
          Last edited by birdie; 01-12-2019, 12:07 PM.

          Comment


          • #6
            This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

            Comment


            • #7
              Originally posted by CochainComplex View Post
              This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
              -O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.

              For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.

              It looks like there are too many theorists on Phoronix who've never compiled anything.

              Comment


              • #8
                Originally posted by CochainComplex View Post
                This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
                Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

                It tells the compiler to use instruction and optimization presets for the local build machine.
                Last edited by wagaf; 01-12-2019, 12:15 PM.

                Comment


                • #9
                  Originally posted by wagaf View Post

                  Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

                  It tells the compiler to use instruction and optimization presets for the local build machine.
                  Indeed though distros could use FMV like Clear Linux does to offer optimal code path for CPU at run-time.
                  Michael Larabel
                  http://www.michaellarabel.com/

                  Comment


                  • #10
                    Originally posted by birdie View Post

                    -O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.
                    I've not recently encountered a case where O3 was running significantly slower than O2 or was causing any issue that was not a bug in the program (notice no regressions with O3 in the benchmarks, only performance improvements).

                    Looking at the generated machine code I noticed GCC improved in the last few years and is now closer to Clang for generating "clean" code.

                    I now compile all my code with O3 for release builds.

                    Originally posted by birdie View Post
                    For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.
                    Mostly because x264 has heavy assembly optimizations. But not a reason to not use O3 right ?

                    Comment

                    Working...
                    X