Announcement

Collapse
No announcement yet.

LLVM Clang 12 Benchmarks At Varying Optimization Levels, LTO

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Clang 12 Benchmarks At Varying Optimization Levels, LTO

    Phoronix: LLVM Clang 12 Benchmarks At Varying Optimization Levels, LTO

    Earlier this month were benchmarks looking at GCC 11 performance with varying optimization levels and features like link-time optimizations. Stemming from reader requests, here are now similar reference benchmarks off LLVM Clang 12.0 on the same system with going from -O0 to -Ofast and toggling -march=native and LTO usage.

    https://www.phoronix.com/vr.php?view=30294

  • #2
    It appears that coding a compiler sensibly leads to it acting far more predictably

    Comment


    • #3
      "Intel Tiger Lake-H chipset"?

      Comment


      • #4
        but -Os and -Oz? ;-)

        Comment


        • #5
          Yay, meme flags.

          Comment


          • #6
            The various -O2 results have me wondering where -Os and -Oz with -march=native and -flto would stack up with the rest and if the old 90s and 00s anecdote of "smaller targeted binaries are faster overall" still trends true or if modern CPUs having more cache renders those settings moot.

            Overall these results are what I'd expect to see based on their names and descriptions. I like it when everything works out like that.

            Comment


            • #7
              Originally posted by skeevy420 View Post
              The various -O2 results have me wondering where -Os and -Oz with -march=native and -flto would stack up with the rest and if the old 90s and 00s anecdote of "smaller targeted binaries are faster overall" still trends true or if modern CPUs having more cache renders those settings moot.

              Overall these results are what I'd expect to see based on their names and descriptions. I like it when everything works out like that.
              Alpine Linux and Void had significant performance benefits owing to their smaller binaries, but the performance delta was markedly reduced on high-cache processors.

              Comment


              • #8
                Originally posted by loganj View Post
                "Intel Tiger Lake-H chipset"?
                Good catch. The H-series laptop chips need an external southbridge, similar to desktop chips. My guess is that, for Tiger Lake H, Intel just reused the same southbridge that some Rocket Lake boards use, and that explains why it got detected as such.

                Comment


                • #9
                  Originally posted by skeevy420 View Post
                  The various -O2 results have me wondering where -Os and -Oz with -march=native and -flto would stack up with the rest and if the old 90s and 00s anecdote of "smaller targeted binaries are faster overall" still trends true or if modern CPUs having more cache renders those settings moot.
                  Not all benchmarks put equal pressure on instruction cache. In cases that are more limited by it, perhaps you could get a net benefit with that combination.

                  However, in cases where the hotspots are dominated by a small number of loops, aggressive inlining, unrolling, and vectorization is going to be the winning strategy.

                  Comment


                  • #10
                    Would be good to combine the Clang-12 and GCC-11 results.
                    Also some sort of a final mean in the end on mean winner and first places winner.

                    Comment

                    Working...
                    X