Announcement

Collapse
No announcement yet.

GCC Eyeing -O2 Vectorization For Boosting Intel Core / AMD Zen Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC Eyeing -O2 Vectorization For Boosting Intel Core / AMD Zen Performance

    Phoronix: GCC Eyeing -O2 Vectorization For Boosting Intel Core / AMD Zen Performance

    Longtime GNU Compiler Collection (GCC) developer Jan Hubicka of SUSE is looking at enabling vectorization as part of the -O2 optimization level for Intel Core, AMD Zen, and generic x86_64 CPU targets...

    http://www.phoronix.com/scan.php?pag...-Vectorization

  • #2
    I'm eying -Os vectorisation, however the last time I checked this had no effect, maybe due to no loop unrolling or increasing code size constraints, … :-/

    Comment


    • #3
      Last time I checked "-ftree-vectorize" almost always caused performance degradation but that was a long time ago (GCC 4.9 days, i.e. almost five years ago).

      Comment


      • #4
        "-ftree-vectorize -ftree-slp-vectorize" is redundant. The ftree-vectorize directive enables both ftree-loop-vectorize and ftree-slp-vectorize optimizations.

        Comment


        • #5
          Compiler autovectorization is absolutely essential these days if you intend to make your cpus work at least a bit as how they were intended to be used. Otherwise go back to 2008 era nehalem and westmere, they run just fine. But if you want to make use of avx and don't want to hand-code asm, compiler can help you. ICC for example does vectorization at -O2, so does clang. Holding it back on gcc is not helpful.

          Comment


          • #6
            Loop vectorizer turns itself off on loops considered cold (all loops with -Os are), so indeed it will do nothing. SLP vectorizer seems to reduce code size in some cases but it needs more tuning to be good size optimization.

            I have compared GCC and Clang performance on Firefox with link-time optimizations and profile feedback http://hubicka.blogspot.com/2018/12/...lding-and.html
            and now I do the same without profile feedback and with/without LTO. Auto-inlining shows up here as main difference - one needs to enable -finline-function to fix some of tests that are sower with GCC compared to Clang with -O2. Then one needs to tune down inliner limits to get smaller & faster binary. So I hope to improve the defaults for GCC 10.

            With vectorization it would be really useful to have more real-world data in addition to common benchmarks.

            Comment


            • #7
              If this doesn't make it in to GCC 9, can openSUSE change its defaults for building x86_64 packages to include -ftree-vectorize? That, combined with the LTO initiative would be amazing in boosting performance in many packages/benchmarks! Go Tumbleweed!

              Comment


              • #8
                Originally posted by pegasus View Post
                Compiler autovectorization is absolutely essential these days if you intend to make your cpus work at least a bit as how they were intended to be used. Otherwise go back to 2008 era nehalem and westmere, they run just fine. But if you want to make use of avx and don't want to hand-code asm, compiler can help you. ICC for example does vectorization at -O2, so does clang. Holding it back on gcc is not helpful.
                it isn't essential for a lot of software and even then testing needs to be done to ensure that you are actually getting a benefit from vectorization. I'd much rather see a focus on optimized libraries than hoping that vectorization can help random code.

                Comment


                • #9
                  Originally posted by hubicka View Post
                  Loop vectorizer turns itself off on loops considered cold (all loops with -Os are), so indeed it will do nothing. SLP vectorizer seems to reduce code size in some cases but it needs more tuning to be good size optimization.

                  I have compared GCC and Clang performance on Firefox with link-time optimizations and profile feedback http://hubicka.blogspot.com/2018/12/...lding-and.html
                  and now I do the same without profile feedback and with/without LTO. Auto-inlining shows up here as main difference - one needs to enable -finline-function to fix some of tests that are sower with GCC compared to Clang with -O2. Then one needs to tune down inliner limits to get smaller & faster binary. So I hope to improve the defaults for GCC 10.

                  With vectorization it would be really useful to have more real-world data in addition to common benchmarks.
                  Why not -Og ? Less good for debugging purposes, or so I was told, but it does optimize a lot more than -O2.

                  Comment


                  • #10
                    Originally posted by Vistaus View Post

                    Why not -Og ? Less good for debugging purposes, or so I was told, but it does optimize a lot more than -O2.
                    Optimizations of -Og are subset of those enabled for -O2. Vectorization is not a good candidate for -Og because it changes order in which program executes the program that is not good for debugging.

                    Comment

                    Working...
                    X