Announcement

Collapse
No announcement yet.

Patches Posted For GCC LTO Optimizing The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    It would be so cool to be so cool to be able to build an LTO optimized kernel!
    And with these high electrical bills, it will probably be a little bit more power efficient also.

    Comment


    • #12
      Originally posted by cb88 View Post
      Something is wrong with that if the kernel ended up larger due to inlining... that's exactly the opposite of how that is supposed to work.
      It really isn't, but usually the increase in code would be worth it in terms of performance. Optimizations usually sacrifice one aspect for another; there is no free lunch. It's one of the reasons -O3 is not activated by default for the Kernel.

      Comment


      • #13
        Let's see, will they finally be accepted? I hope so. I also hope for some major performance gains, at least when complemented with KCFLAGS=-O3.

        Comment


        • #14
          Originally posted by cb88 View Post
          Something is wrong with that if the kernel ended up larger due to inlining... that's exactly the opposite of how that is supposed to work.
          No, not really. Only if you selected to optimise for size (-Os) and then ended up with a larger kernel would you be right, but with -O2 is the code allowed to grow.

          The main problem is with benchmarking the kernels themselves. Ideally does a kernel leave as much CPU time to the applications and are therefore notoriously difficult to benchmark with only application benchmarks. One has to create artificial benchmarks that test individual parts of a kernel to magnify any gains. There are a few exceptions like loopback networking, memory compression, file encryption, and file operations on very fast drives, where one can see differences.

          Comment


          • #15
            Originally posted by xorbe View Post
            Inlining duplicates the called function locally ... duplicating code doesn't usually shrink anything. But it avoids a call/ret pair and hence avoids 2 possible branch mispredicts/stalls.
            A direct jump (since it gets inlined so the compiler knows its target) will never be mispredicted. What's there to predict when it doesn't depend on runtime factors?

            However, inlining can potentially remove redundant parameters/checks. Imagine you pass a compile-time constant to a function, and it has an early if for it, and returns a value. Inlining allows the compiler to literally replace the whole function call with the returned value. It knows everything at compile time so it can do it.

            So yeah, inlining should make code smaller, not larger. Aggressive inlining, which is at -O3, will increase code size though. But this is -O2 so increase in code size is a bit weird indeed.

            Let's not forget functions that are called only from one place in the source code, but from different translation units, so without LTO they can't be inlined. But clearly there's more code when they're not inlined, since you need the call/ret, then the function prologue and epilogue. All of these are removed when inlined, and in fact compilers will always inline such functions because it's always a perk to inline functions that are called only from one place.
            Last edited by Weasel; 15 November 2022, 10:19 AM.

            Comment


            • #16
              Originally posted by Weasel View Post
              However, inlining can potentially remove redundant parameters/checks. Imagine you pass a compile-time constant to a function, and it has an early if for it, and returns a value. Inlining allows the compiler to literally replace the whole function call with the returned value. It knows everything at compile time so it can do it.

              So yeah, inlining should make code smaller, not larger. Aggressive inlining, which is at -O3, will increase code size though. But this is -O2 so increase in code size is a bit weird indeed.

              That's only true if at least one of the parameters is known at compile time (either constant or folded into a constant via other optimizations).

              There's also situations where the function is relatively large and is called at multiple sites.
              Each sites might actually contains some constant, but with the copies, it actually gets larger.

              Comment


              • #17
                Originally posted by ms178 View Post
                Ouch, the GCC-LTO patchset did get some heavy flak from other devs for the lack of improvements and too much hacks/complexity for no gain. At least some performance improvements would be welcome, but weren't observable with it.

                dimko Try out both, a Kernel build does not taking too long to finish. As I've used both GCC and Clang/LTO with some advanced flags but wasn't able to spot too many differences. I keep using the Clang/FullLTO-Kernel though as it is a bit smoother for me, and I hit some strange compile issues with GCC lately.
                Yeah, i did not fuck with my machine in a while, i may as well stretch my hands.

                Comment


                • #18
                  Originally posted by ms178 View Post
                  Ouch, the GCC-LTO patchset did get some heavy flak from other devs for the lack of improvements and too much hacks/complexity for no gain. At least some performance improvements would be welcome, but weren't observable with it.

                  dimko Try out both, a Kernel build does not taking too long to finish. As I've used both GCC and Clang/LTO with some advanced flags but wasn't able to spot too many differences. I keep using the Clang/FullLTO-Kernel though as it is a bit smoother for me, and I hit some strange compile issues with GCC lately.
                  Compiled it couple of days ago, running it now, don't see much of a difference, for better or worse.
                  May be marginal improvement, but I suspect a placebo effect.(also I upgraded kernel, not just recompiled existing one, so there is also that)

                  Comment

                  Working...
                  X