Announcement

Collapse
No announcement yet.

BOLT Merged Into LLVM To Optimize Binaries For Faster Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Jannik2099 View Post
    BOLT works with gcc-built binaries too, and it works on every compiled program - it's a second compilation pass much like PGO
    Then I have another question. Can the 1st pass/profile be used for PGO and BOLT, or is is better to, say, recompile with PGO optimizations, then profile the result with BOLT?


    I think many devs/distros would think twice before doing 3-pass compiles.


    EDIT2: Also, whats the chance of a regression via BOLT? I've noticed performance regressions in a few programs with what I thought was a good PGO profile.
    Last edited by brucethemoose; 11 January 2022, 07:51 PM.

    Comment


    • #12
      Originally posted by brucethemoose View Post

      Then I have another question. Can the 1st pass/profile be used for PGO and BOLT, or is is better to, say, recompile with PGO optimizations, then profile the result with BOLT?


      I think many devs/distros would think twice before doing 3-pass compiles.


      EDIT2: Also, whats the chance of a regression via BOLT? I've noticed performance regressions in a few programs with what I thought was a good PGO profile.
      I haven't worked with it, but the first profile should theoretically be sufficient for both - BOLT cares about what functions are commonly used together, and that stays invariant under PGO.

      Though it's not like any distro does widespread PGO to begin with, since you need a profile after all.

      In general, if you have regressions with PGO/BOLT, then your profile was probably misleading. Deviations would require individual analysis - usually a function that should not have been inlined but did

      Comment


      • #13
        Could this be used to improve compiled shaders as well?

        Comment


        • #14
          Originally posted by geearf View Post
          Could this be used to improve compiled shaders as well?
          This in particular? No. Something similar specifically tailored to GPUs and their layouts, it could, if they aren't already stringently optimized.

          Comment


          • #15
            Originally posted by dragorth View Post

            This in particular? No. Something similar specifically tailored to GPUs and their layouts, it could, if they aren't already stringently optimized.
            Thank you!

            Comment


            • #16
              [QUOTE=Jannik2099;n1301803]
              I haven't worked with it, but the first profile should theoretically be sufficient for both - BOLT cares about what functions are commonly used together, and that stays invariant under PGO.
              /QUOTE]

              I have, in 2018, with Ubuntu 18.04. I BOLT'ed a few binaries that I cared about, and got results from 0 to 18% faster depending on whether they were statically linked and whether the performance-critical code sections could be reorganised to fit my CPU's cache. In the end it was more trouble than it was worth to me, but I do have a statically-linked BOLT'ed PHP 7.4 interpreter running on a production server. That is one of the 18% performance improvement cases.

              People need to bear in mind that BOLT optimises the binary for a specific workload. It will almost certainly run the training workload faster, and it almost certainly will regress other workloads. So unless you have a workflow that uses those binaries for that specific workload to a high degree, this is probably not worth your time to explore. Not that I want to discourage people to play with it and explore the results, just don't want to make anyone feel misled that this is likely to be a productive use of their time, lol.
              Last edited by linuxgeex; 12 January 2022, 01:10 AM.

              Comment


              • #17
                How fast is it to do the bolt treatment on 100mb of binaries ?
                Just to get an idea..

                Comment


                • #18
                  So excited. I am going to build LLVM right now. Hopefully it works with AMD this time

                  Comment


                  • #19
                    > BOLT works with gcc-built binaries too, and it works on every compiled program - it's a second compilation pass much like PGO

                    > it's a feedback driven optimization pass. You build the binary, collect profiling data, then build it again. Thus it doubles.

                    Not sure if that is correct, with bolt you don't need to recompile your code.

                    Comment


                    • #20
                      So now we want to see some benchmarks

                      Comment

                      Working...
                      X