Announcement

Collapse
No announcement yet.

Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By Clang

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by CochainComplex View Post
    Yes LTO or even PGO with the Kernel is seriously not so easy. DId you use march=generic in config?
    No, I have a Haswell based system so I optimise my kernels for that.

    Comment


    • #12
      Originally posted by oleid View Post
      Execution speed is usually not really improving with lto, but size reduction is often measurable.
      Wouldn't that size reduction possibly imply less cache misses and thus greater performance?

      Comment


      • #13
        Originally posted by FPScholten View Post

        No, I have a Haswell based system so I optimise my kernels for that.
        I have seen LTO builds break more easily with march other then generic.

        Comment


        • #14
          Yes, but there is an actual and measurable performance and efficiency effect using the matching architecture for your processor. LTO might help even more but unfortunately does not build.

          Comment


          • #15
            Originally posted by freedonuts View Post

            Wouldn't that size reduction possibly imply less cache misses and thus greater performance?
            Hard to give a general answer here. I think the reduction of size often comes from removing unused stuff or sharing the same code better.

            Comment


            • #16
              Originally posted by oleid View Post

              Hard to give a general answer here. I think the reduction of size often comes from removing unused stuff or sharing the same code better.
              You kind of proved my point here. Caches work with (among other things) with the idea of spacial locality which basically says that if you accessed data at x then you'll probably soon access data at x+1 and so whole chunk of memory gets prefetched. If there's (as you said) "unused stuff" at address x+1 then unused stuff will get prefetched, you see where I'm going with this...

              And of course it's hard to generalize this but optimizing cache access is something that is done in performance critical scenarios. Reducing code size to exploit spacial locality is a technique to do that.

              Comment


              • #17
                Originally posted by Aryma View Post
                i have a dream to see linux without GNU
                Carefull what you wish for. There are a lot of good GNU projects, many of them use GPL.
                Many of the replacements or alternatives have various corporate friendly licenses and are heavily influenced by companies like google.
                My dream is to see less reinventing of the wheel, KISS. I miss the old GNU/Linux community ... a lot has changed over the past 20yrs.

                Comment


                • #18
                  Originally posted by Soul_keeper View Post

                  Carefull what you wish for. There are a lot of good GNU projects, many of them use GPL.
                  Many of the replacements or alternatives have various corporate friendly licenses and are heavily influenced by companies like google.
                  My dream is to see less reinventing of the wheel, KISS. I miss the old GNU/Linux community ... a lot has changed over the past 20yrs.
                  sorry but I really hate Glibc more than anything

                  Comment


                  • #19
                    Originally posted by freedonuts View Post

                    You kind of proved my point here.
                    Kind of. If the stuff which got removed was close-by, then yes. If not, then not.
                    Also, it is hard to tell if such an improvement is measurable. If it is in an inner loop, then surely. If not, then maybe.

                    But in any case I agree that lto is desirable to improve the output of the compiled code.

                    Comment


                    • #20
                      Originally posted by CochainComplex View Post

                      I have seen LTO builds break more easily with march other then generic.
                      Vectorizing and register optimization will often break code that doesn't follow the standards. Or it is a bug in the compiler. But when I see this it's usually a code problem. Usually a pointer cast and aliasing violation. After LTO the function call is optimized away, the value is only in registers and the aliased pointer cast reads or writes some old memory location that hasn't been updated since program start.

                      Comment

                      Working...
                      X