Announcement

Collapse
No announcement yet.

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    Phoronix: GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

    With the mention earlier this week of GCC potentially enabling the vectorize options at the -O2 optimization level, I carried out some fresh GCC 9 development benchmarks at various optimization levels for reference.

    http://www.phoronix.com/vr.php?view=27392

  • #2
    The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.

    Comment


    • #3
      Originally posted by birdie View Post
      The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.
      If you are tuning for fun then more options should be more joys. If you are focusing on particular workflow like machine learning then optimizing only a few core packages should be enough. Otherwise for usual usage I doubt the time saved by optimization can overweight the time used to compile those packages.

      Comment


      • #4
        Was there a reason -Ofast wasn't included? Great article!

        Comment


        • #5
          What about -Os?

          Comment


          • #6
            Originally posted by tichun View Post
            What about -Os?
            As for x86 desktop CPUs with a lot of cache the code generated with -Os is almost always significantly slower than the one compiled with -O2. Back in Pentium 3 days and earlier I used -Os when RAM was limited and expensive. Nowadays, this option makes sense only for embedded/memory constrained devices.

            Here are some recent -O2 vs -Os results: https://rv8.io/bench

            And here's GCC developers attitude towards using -Os:
            Code:
            First let me put into some perspective on -Os usage and some history:
            1) -Os is not useful for non-embedded users
            2) the embedded folks really need the smallest code possible and
            usually will be willing to afford the performance hit
            3) -Os was a mistake for Apple to use in the first place; they used it
            and then GCC got better for PowerPC to use the string instructions
            which is why -Oz was added :)
            4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.

            Michael ran -Os tests not so long ago.
            Last edited by birdie; 01-12-2019, 12:07 PM.

            Comment


            • #7
              This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

              Comment


              • #8
                Originally posted by CochainComplex View Post
                This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
                -O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.

                For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.

                It looks like there are too many theorists on Phoronix who've never compiled anything.

                Comment


                • #9
                  Originally posted by CochainComplex View Post
                  This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions
                  Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

                  It tells the compiler to use instruction and optimization presets for the local build machine.
                  Last edited by wagaf; 01-12-2019, 12:15 PM.

                  Comment


                  • #10
                    Originally posted by wagaf View Post

                    Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

                    It tells the compiler to use instruction and optimization presets for the local build machine.
                    Indeed though distros could use FMV like Clear Linux does to offer optimal code path for CPU at run-time.
                    Michael Larabel
                    http://www.michaellarabel.com/

                    Comment

                    Working...
                    X