Announcement

Collapse
No announcement yet.

LLVM's BOLT Flipped On By Default For Linux x86/AArch64 Test Releases

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM's BOLT Flipped On By Default For Linux x86/AArch64 Test Releases

    Phoronix: LLVM's BOLT Flipped On By Default For Linux x86/AArch64 Test Releases

    BOLT as the Facebook/Meta-developed tech for optimizing binaries in the name of greater performance by optimizing the code layout was merged to mainline LLVM at the start of the year. Now as we approach the end of the year BOLT is getting a bit of a promotion with being flipped on by default for Linux x86_64 and AArch64 test releases...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Typo:

    Originally posted by phoronix View Post
    Generating an optimized binary does work for large applications with Faceobok/Meta

    Comment


    • #3
      I wonder how long before people implement some kind of machine-learning-driven compilers? I could see that being a huge win potentially.

      Comment


      • #4
        Originally posted by quaz0r View Post
        I wonder how long before people implement some kind of machine-learning-driven compilers? I could see that being a huge win potentially.
        It would probably make sense to use ML to produce heuristics.
        Modern optimization passes like inlining, constant folding, etc can be done by conventional coding very well, but heuristics for whether to inline a function is still quite hard to get right.
        ML can very well help with that.

        Comment


        • #5
          Does anybody understand why BOLT works on ready binaries ... i.e. looks like the project is great, but to disassemble what was just assembled looks a little bit weird, i.e. according to the description it should be an alternate mode of PGO inside a compiler or as a compiler plug-in

          Comment


          • #6
            (Over-) simplified:

            - LTO: "Jump around less!"
            - PGO: "Hint the branch predictor of what is HOT or not." (*)
            - BOLT: "Use caches better!" (*)

            *: Needs runtime profile data

            This, I think, enables LLVM itself to be built with BOLT for Linux on Arm, x86_64 and aarch64‚Äč such that the compiler itself gets to build other things a little bit faster.
            In other words, developers are happy to see this.

            Comment


            • #7
              LuukD Just curious, can PGO actually deoptimize cold path and increase the inline threshold for hot path?

              Comment


              • #8
                Originally posted by NobodyXu View Post
                LuukD Just curious, can PGO actually deoptimize cold path and increase the inline threshold for hot path?
                I am by no means an authority on the subject, so if compiler people would like to chime in, please do.
                AFAIK this is what PGO does:
                • Use profile information for register allocation to optimize the location of spill code.
                • Improve branch prediction for indirect function calls by identifying the most likely targets. (Some processors have longer pipelines, which improves branch prediction and translates into high performance gains.)
                • Detect and do not vectorize loops that execute only a small number of iterations, reducing the run time overhead that vectorization might otherwise add.
                [cited from link below]

                I have read elsewhere that LLVM likes to inline everything unless it has deduced / or is directed otherwise.
                I think inlining is best done early on, because caller and callee become one, and this may allow code elimination. Then run instrumented to create a profile, and then decide upon vectorization, register allocation and branch hints. (PGO), then do LTO and then do BOLT.

                That being said, maybe there are compilers which, indeed, utilize profile data to improve upon inlining but I would not know.

                Comment


                • #9
                  LuukD Thanks!

                  Comment

                  Working...
                  X