Announcement

Collapse
No announcement yet.

A Closer Look At The GCC 8 Compiler Performance On Intel Skylake

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Closer Look At The GCC 8 Compiler Performance On Intel Skylake

    Phoronix: A Closer Look At The GCC 8 Compiler Performance On Intel Skylake

    In continuing with our recent benchmarks of the brand new GCC 8.1 compiler, here are more tests while using an Intel Skylake CPU and testing with -O2, -O3, and -O3 -march=native optimization levels while comparing the resulting binary performance of GCC 8.1 and GCC 7.3.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I would really like to see the same benchmarks on a dual socket skylake system!

    Comment


    • #3
      Is it still true with recent gcc versions that autovectorization is only applied at -O3 but not at -O2? I know this is true for gcc 6 for example. And it produces interesting results on avx512 ... some applications show major improvements, some others major slowdowns (due to thermal throttling). It seems like there's a sweet spot at how much appart (timewise) you can submit avx512 instructions to get the most out of it, but since this timing depends on the cooling system performance, it is not really deterministic.

      Comment


      • #4
        Originally posted by pegasus View Post
        Is it still true with recent gcc versions that autovectorization is only applied at -O3 but not at -O2? I know this is true for gcc 6 for example. And it produces interesting results on avx512 ... some applications show major improvements, some others major slowdowns (due to thermal throttling). It seems like there's a sweet spot at how much appart (timewise) you can submit avx512 instructions to get the most out of it, but since this timing depends on the cooling system performance, it is not really deterministic.
        I don't understand why 1 avx512 instruction should cost more energy than 2 avx2 instructions...

        Comment


        • #5
          Nice article, Michael. On the previous gcc article, there was some odd regression in the Coffee Lake part, any chance you could look into that like you did this chip? Thank you!

          Comment


          • #6
            JavaScript is needed to view these results.
            ... that worked before.

            Comment


            • #7
              Originally posted by tillschaefer View Post
              JavaScript is needed to view these results.
              ... that worked before.
              Most likely will begin requiring JavaScript for graph viewing by non-Premium members moving forward or just a ASCII/text-based graph for non-JS users.
              Michael Larabel
              https://www.michaellarabel.com/

              Comment


              • #8
                Originally posted by pegasus View Post
                Is it still true with recent gcc versions that autovectorization is only applied at -O3 but not at -O2? I know this is true for gcc 6 for example. And it produces interesting results on avx512 ... some applications show major improvements, some others major slowdowns (due to thermal throttling). It seems like there's a sweet spot at how much appart (timewise) you can submit avx512 instructions to get the most out of it, but since this timing depends on the cooling system performance, it is not really deterministic.
                That is still the case. Even basic block vectorization is not done on -O2 even though it produces smaller binaries

                Comment


                • #9
                  The timed PHP compilation looks wrong. The value for gcc 8.1 with -O3 -march=native is quite a bit off when compared to -O3. The -march switch should not have such an impact on the overall compiler performance with nearly 4% difference.

                  Comment


                  • #10
                    Originally posted by lucasbekker View Post
                    I don't understand why 1 avx512 instruction should cost more energy than 2 avx2 instructions...
                    It's not the computation itself, it's the data movement that's costing the most energy. Compared to it computation is essentially free. With avx512 you have more data flying around in a shorter timespan in a physically smaller space so you generate more heat in a smaller space and therefore heat up much more.

                    Comment

                    Working...
                    X