No announcement yet.

Intel's Newest Software Effort For Achieving Greater Performance: Thin Layout Optimizer

  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    ptr1337: any plans for v3-lto-bolt packages?


    • #12
      Originally posted by joebonrichie View Post
      That workflow looks significantly easier than BOLT, although you basically have to compile your own binutils unless they get those gnu linker patches upstreamed. So it probably evens out.

      A bummer for me personally is that AMD CPUs do not support LBR (perf -j flag) until zen3 AFAIK. I personally have a zen2 CPU so I have to rely on
      llvm-bolt -instrument
      to experiment with post-link optimizers. LBR is also unavailable in some environments such as VMs AFAIU.
      the goal absolutely is to get this into binutils

      also one of the reasons we picked our "flow" architecture is so that you can measure on a different machine than you compile -- precisely since your cloud builder may not have a GPU or may not expose LBR or .. or ...
      Decoupling where you measure from where you build was a key design objective.
      As was being able to combine multiple measurements into basically a "super measurement" so that you can measure in very different environments (say 2 different GPU vendors) and still are able to use the result for both mesa drivers
      Last edited by arjan_intel; 12 April 2024, 09:54 AM.


      • #13
        Originally posted by ms178 View Post
        arjan_intel Some blog posts with how-to's for normies like me to use that tool to optimize the Linux Kernel or Mesa would be highly appreciated.
        we'll see what we can do -- including seeing if we should publish profiles we've measured as reference (of course you can always measure your own but for testing this would allow you to get a start)


        • #14
          Originally posted by reba View Post
          ptr1337: any plans for v3-lto-bolt packages?
          We are using BOLT already at python, but sadly the "bump" isnt that big.
          Also we have made some tests on chromium, which were succesful but it requires to disable CFI, which is really bad for security.

          Generally we are planing to work on more BOLT'd packages in the future, but it has serious problems with shared libaries, which are mainly used us/archlinux.

          Anyways, here you can also find a BOLTd llvm toolchain, when you export the PATH you can enjoy a fast PGOd, LTOd and BOLTd clang


          • #15
            Originally posted by ptr1337 View Post

            Generally we are planing to work on more BOLT'd packages in the future, but it has serious problems with shared libaries, which are mainly used us/archlinux.
            This is my experience as well. Additionally, the -use-old-text option can rarely can fit the reordered binary in the original .text section leading to massively bloated binaries which in turn creates an additional startup cost. The patchset to rewrite the binary to avoid these massive binaries seem to have stalled


            • #16
              Well I'll be damned, finally someone reinvented TKB (Task Builder, the RSX-11M/M-Plus linker) although this one is probably more like checkers to the chess of TKB


              • #17
                Wow, the stuff in the wiki about whole-system profiling, and merging profiles from multiple runs is incredible, especially with the compact storage format. Imagine the potential to optimize an entire distribution's system libraries with opt-in telemetry!