Announcement

Collapse
No announcement yet.

BOLT Close To Merging Into LLVM For Optimizing Performance Of Binaries

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BOLT Close To Merging Into LLVM For Optimizing Performance Of Binaries

    Phoronix: BOLT Close To Merging Into LLVM For Optimizing Performance Of Binaries

    In addition to the LLVM SPIR-V back-end appearing ready for merging, also working through the final steps for being mainlined in the LLVM compiler stack is also Facebook's BOLT project for optimizing the performance of binaries...

    https://www.phoronix.com/scan.php?pa...Inches-To-LLVM

  • #2
    I am waiting for the news of Valve providing game binaries using BOLT... that would be an awesome Christmas present.

    Comment


    • #3
      Note that Bolt's paper does not compare gcc with LTO+FDO to bolted binary. Only feature that Bolt has and GCC does not do is the function reordering. GCC does the hot/cold partitioning but the hot partition is kept in "kind of random" order. We have prototype implementation of the reordering pass by Martin Liska. Reason why it is not in mainline is that we was not able to find bechmarks where it gives reliable speedups over the current code layout. It would be nice to understand how much of of performance potnetial is here.

      Bolt is a nice project, but unless there are bugs somewhere the LTO+FDO should give reliably better results in all scenarios where it can be well applied (main downside of FDO compared to autofdo is training overhead)

      Comment


      • #4
        Originally posted by ms178 View Post
        I am waiting for the news of Valve providing game binaries using BOLT... that would be an awesome Christmas present.
        Primetime would be AOCC instead of LLVM if you have an AMD Zen or later CPU

        Comment


        • #5
          Originally posted by ms178 View Post
          I am waiting for the news of Valve providing game binaries using BOLT... that would be an awesome Christmas present.
          They won't, that's not how it works.
          You need to run the binary with the most common flow(playing the game), to optimize it.
          1. They don't have access to the source code of the games
          2. It's impossible to do that, with so many games out there.

          Comment


          • #6
            Originally posted by hubicka View Post
            Note that Bolt's paper does not compare gcc with LTO+FDO to bolted binary. Only feature that Bolt has and GCC does not do is the function reordering. GCC does the hot/cold partitioning but the hot partition is kept in "kind of random" order. We have prototype implementation of the reordering pass by Martin Liska. Reason why it is not in mainline is that we was not able to find bechmarks where it gives reliable speedups over the current code layout. It would be nice to understand how much of of performance potnetial is here.

            Bolt is a nice project, but unless there are bugs somewhere the LTO+FDO should give reliably better results in all scenarios where it can be well applied (main downside of FDO compared to autofdo is training overhead)
            They don't because it's not the point to compare Clang vs GCC.
            They are testing their optimizations compared to the baseline(Clang).

            Comment


            • #7
              Originally posted by Alliancemd View Post

              They won't, that's not how it works.
              You need to run the binary with the most common flow(playing the game), to optimize it.
              1. They don't have access to the source code of the games
              2. It's impossible to do that, with so many games out there.
              As far as I understand BOLT, the beauty of it is that it can be used on binaries, too. So no source code is required to get some of the promised gains.

              Comment


              • #8
                Originally posted by Alliancemd View Post

                They don't because it's not the point to compare Clang vs GCC.
                They are testing their optimizations compared to the baseline(Clang).
                I only know of paper https://arxiv.org/abs/1807.06735 which does test against both gcc and clang. However gcc has LTO and funtion splitting disabled. It would be nice to have similar numbers with LTO+PGO set up well.

                Comment


                • #9
                  Originally posted by ms178 View Post

                  As far as I understand BOLT, the beauty of it is that it can be used on binaries, too. So no source code is required to get some of the promised gains.
                  Would this break client-side anti-cheat software? Since it means modifying the original binaries.

                  Comment


                  • #10
                    Originally posted by tildearrow View Post

                    Would this break client-side anti-cheat software? Since it means modifying the original binaries.
                    Great question, I don't know how client-side anti-cheat software works and if the data layout is one factor there. I hope Valve can find a solution, even though we won't get the full LTO+PGO+BOLT benefits without the source code, the benefits BOLT alone brings might justify some effort to get it to work.

                    Comment

                    Working...
                    X