Announcement

Collapse
No announcement yet.

Link-Time Optimizations Near Reality For x86 Linux Kernel

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Link-Time Optimizations Near Reality For x86 Linux Kernel

    Phoronix: Link-Time Optimizations Near Reality For x86 Linux Kernel

    Another interesting change that's already landed for the Linux 3.15 kernel is infrastructure work for supporting x86 kernels optimized via LTO for yielding better kernel performance...

    http://www.phoronix.com/vr.php?view=MTY0OTc

  • #2
    64 bit systems are supported by this technology or they don't need it?

    Comment


    • #3
      I'd like to see any public benchmark result for kernels compiled with LTO, with those speedups and "minor regressions"

      Comment


      • #4
        This is a bit above my head, but LLVM also has LTO. Last time I heard (lat last year), LLVM was already pretty close to compile the kernel without extra patching. That's another area where optimizations could arise. Or at least, given that LLVM/CLang is much faster, perhaps LLVM + LTO will still keep compile times reasonably manageable.

        http://llvm.org/docs/LinkTimeOptimization.html

        Comment


        • #5
          "The work back in 2012 by Intel developers showed the kernel compile time increased by two to four times and needed 4~9GB of memory to complete the task"
          This may not be an issue in GCC 4.9
          http://www.phoronix.com/forums/showt...457#post408457

          Comment


          • #6
            Originally posted by Azrael5 View Post
            64 bit systems are supported by this technology or they don't need it?
            x86 is just an instruction set, "x86-64" or "x64" are just slang to refer to the 64-bit version of x86. Heck, x86 was originally 16-bit!

            Originally posted by newwen View Post
            I'd like to see any public benchmark result for kernels compiled with LTO, with those speedups and "minor regressions"
            I'd like to see that as well, this is of particular interest to me with this project of mine: http://forum.xda-developers.com/devdb/project/?id=1098

            Comment


            • #7
              Originally posted by MWisBest View Post
              x86 is just an instruction set, "x86-64" or "x64" are just slang to refer to the 64-bit version of x86. Heck, x86 was originally 16-bit!



              I'd like to see that as well, this is of particular interest to me with this project of mine: http://forum.xda-developers.com/devdb/project/?id=1098
              So 64bit linux systems are optimized or not in link-tyme?

              Comment


              • #8
                Originally posted by mendieta View Post
                This is a bit above my head, but LLVM also has LTO. Last time I heard (lat last year), LLVM was already pretty close to compile the kernel without extra patching. That's another area where optimizations could arise. Or at least, given that LLVM/CLang is much faster, perhaps LLVM + LTO will still keep compile times reasonably manageable.

                http://llvm.org/docs/LinkTimeOptimization.html
                LTO support need to be throughout the toolchain, the compiler is just one part.
                And LTO has some nasty surprises of breaking code that worked before, both by exposing bugs that only occur when you aggressively inline functions (arguably code thats not conform to C/C++ standard details) and by bugs withing the LTO paths. And then the same starts over when linking (throwing away data/code that proved unused/unreachable - while not knowing every use of it).

                LLVM fares alot better in the "LTO bugs" category since it was designed for it instead of having it rudely patched in. Many projects however are adopted to the way compilers worked - and wont just play nice with things that break decade old assumptions.

                Comment


                • #9
                  Originally posted by discordian View Post
                  LLVM fares alot better in the "LTO bugs" category since it was designed for it instead of having it rudely patched in. Many projects however are adopted to the way compilers worked - and wont just play nice with things that break decade old assumptions.
                  LTO in GCC is supported just as well as anything else. Or do you consider code such as the automatic vectorization optimizer or hidden symbol support "rudely patched in?"

                  It takes some time for new features to mature and have all of their problems discovered and fixed, that's all.

                  Comment


                  • #10
                    Originally posted by Azrael5 View Post
                    So 64bit linux systems are optimized or not in link-tyme?
                    ... x86 doesn't mean only 32-bit, it technically includes 64-bit as well.

                    Comment


                    • #11
                      Originally posted by MWisBest View Post
                      ... x86 doesn't mean only 32-bit, it technically includes 64-bit as well.
                      thanks for answering
                      Last edited by Azrael5; 04-01-2014, 03:20 PM.

                      Comment


                      • #12
                        Originally posted by Zan Lynx View Post
                        LTO in GCC is supported just as well as anything else. Or do you consider code such as the automatic vectorization optimizer or hidden symbol support "rudely patched in?"

                        It takes some time for new features to mature and have all of their problems discovered and fixed, that's all.
                        It takes even more time if it wasnt supported from ground up but had to use linker plugins. Which is what I was getting at, gcc 4.8 is actually quite decent but earlier versions had quite a few problems.

                        Still there is alot of work, changes and improvements in 4.9 that implies its still not that mature in 4.8.

                        Comment


                        • #13
                          Originally posted by Azrael5 View Post
                          So 64bit linux systems are optimized or not in link-tyme?
                          Compilers traditionally optimize only within each function, not across functions. There are two reasons for this, which are kinda linked. To optimize across functions means you need to know all the functions, which means you need a lot more memory (or very smart data structures) and you need to have all the functions at hand.

                          LTO means applying optimizations at the point of link time --- where you DO now have all the functions at hand. There are a variety of optimizations that can be performed, and I personally would be interested in knowing quite what GCC and LLVM implement.

                          Dead code stripping is obvious --- remove functions that are never called. Linkers have done this before, but with more compilation knowledge available they can do a better job. Eg, in the past if you had that f() called g(), and g() called f(), but no-one else ever called f() or g(), the linker might not detect that f() and g() are dead if it was doing very simple check for "is used or not".

                          More interesting are things like call optimization --- if a certain function is only ever called by another certain function, rather than generic marshaling of parameters on the stack or in registers, the callee can just use the registers already in use by the caller --- or maybe even can be inlined, even from another file.

                          Even more interesting IMHO are code and data reorganization. Code is laid out in an attempt to ensure that functions that are called together lie on the same page. A more aggressive version of this attempts to detect code that is rarely called (eg error handling code) to different pages, so that what's packed into each active page is as much commonly used code as possible, making your TLB entries and cache lines that much denser. These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better. (PDF was one of the original goals of LLVM, but it got dropped by the wayside when there were so many other things to do. I'm guessing one reason it's coming back to prominence is that Apple finally has enough of the essentials in Xcode to feel it's at parity with Dev Studio, and can move on to adding this. PDF and code rearrangement were available on PPC MacOS before OSX, so Apple actually already code in-house for the basic algorithms, it's just a matter of integration.)

                          You can also attempt to apply the usual sort of optimizations between functions. For example if a condition holds in f() the same condition will hold in called function g(), so a test for it in g() is obsolete and can be removed.

                          Comment


                          • #14
                            Originally posted by name99 View Post
                            These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better.
                            Oh, I remember the joys of developing on Alpha and having to run the profiler every time after you built the code so you could rebuild it with decent performance after it analyzed the profile. Fortunately I managed to get back to developing on SPARC pretty quickly.

                            Hopefully they can make it work better than that, but any time you require profiling for performance, you're adding a step that developers won't want to do.

                            Comment

                            Working...
                            X