Announcement

Collapse
No announcement yet.

Experimental -O3 Optimizing The Linux Kernel For Better Performance Brought Up Again

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Experimental -O3 Optimizing The Linux Kernel For Better Performance Brought Up Again

    Phoronix: Experimental -O3 Optimizing The Linux Kernel For Better Performance Brought Up Again

    A set of patches have been posted for making the "-O3" compiler optimization level more easily accessible when building the Linux kernel but still it's not recommended and some kernel developers do not even want to see it as a Kconfig option...

    https://www.phoronix.com/scan.php?pa...l-2022-Patches

  • #2
    It's beyond time. O3 consists solely of safe, standards compliant optimizations on gcc and clang. The myth that O3 "has dangerous experimental optimizations" is just a legend from the gcc 4 era.

    Yes, compiler bugs exist, but the kernel runs into them at O2 just as much. Enabling O3 would potentially bring an initial wave of new discoveries, but nothing more.

    I can't wait for another unqualified Torvalds rant about how "the compiler inserts UB here" - if your code breaks under O3, that's a bug in YOUR code and you should fix it. And there's no guarantee that your bug only ever exhibits at O3. More often than not it will manifest in a later compiler release at O2 too. The kernel is full of this kind of code smell, and years of blaming the compiler has not helped in fixing any of it.

    Comment


    • #3
      I've been doing this for years. Even with LTO and some additional flags to enable graphite and a few other optimizations. All on gcc and with monolithic kernel (all drivers baked in, modules support turned off). Has worked fine for me so far on my desktops/laptops, an ARM64 SBC, and two MIPS APs/routers/modems. Never did any tests to see how much it helps, though. I'd guess anywhere between -5%-5% faster if phoronix compiler benchmarks are to be believed.
      BTW, LTO'd kernels aren't smaller. At least not with gcc and the way I do it (just passing flags to make).

      Comment


      • #4
        On a normal system (with a decent amount of RAM and powerful CPU cores) the kernel itself shouldn't take too much CPU time, so the difference between -O2 and -O3 even if the latter is twice as fast should still be minimal. -O3 might be beneficial for hosting/traffic/VPN providers and people with very weak PCs and that's it.

        And some -O3 options, if used without moderation, are outright harmful, since they bloat up the code and that leads to L1/L2 caches being eviscerated.

        Comment


        • #5
          Do any of you know whether Clear Linux* compiles the kernel with -O3? A quick on-line search on this question didn't turn up anything.

          Comment


          • #6
            Originally posted by birdie View Post
            On a normal system (with a decent amount of RAM and powerful CPU cores) the kernel itself shouldn't take too much CPU time, so the difference between -O2 and -O3 even if the latter is twice as fast should still be minimal. -O3 might be beneficial for hosting/traffic/VPN providers and people with very weak PCs and that's it.

            And some -O3 options, if used without moderation, are outright harmful, since they bloat up the code and that leads to L1/L2 caches being eviscerated.
            Compilers have gotten a lot better at evaluating the speedup from e.g. unrolling vs the cost of the increased cache pressure. This really isn't a general issue anymore that would lead to LESS performance.

            There are a lot of very cpu intensive operations in the kernel. Literally anything involving filesystems or networking, for example. Making the kernel faster is ALWAYS a good idea, there's no such thing as "oh, this machine is fast enough for the kernel anyways" - any cycle eaten away by the kernel will be one less for the actual application you're running

            Comment


            • #7
              Originally posted by Jannik2099 View Post

              Compilers have gotten a lot better at evaluating the speedup from e.g. unrolling vs the cost of the increased cache pressure. This really isn't a general issue anymore that would lead to LESS performance.
              Benchmarks, benchmarks, benchmarks!

              Originally posted by Jannik2099 View Post
              There are a lot of very cpu intensive operations in the kernel. Literally anything involving filesystems or networking, for example. Making the kernel faster is ALWAYS a good idea, there's no such thing as "oh, this machine is fast enough for the kernel anyways" - any cycle eaten away by the kernel will be one less for the actual application you're running
              In my 25+ years of using PC, laptops, etc. I've had 0 situations where ntoskrnl.exe or vmlinuz took a discernible amount of CPU time.

              Comment


              • #8
                Originally posted by SteamPunker View Post
                Do any of you know whether Clear Linux* compiles the kernel with -O3? A quick on-line search on this question didn't turn up anything.
                the buildscript does not explictly mention it:
                https://github.com/clearlinux-pkgs/l...ain/linux.spec

                but CFLAG is specified as such and contains O3
                https://community.clearlinux.org/t/w...he-kernel/3422

                ...besides I have only seen a few pkgs build with removeing -O3 via sed
                Last edited by CochainComplex; 23 June 2022, 07:51 AM.

                Comment


                • #9
                  Originally posted by CochainComplex View Post
                  Unless they patched the kernel elsewhere, no they do NOT. The kernel Makefile would override any previous -O flag.

                  -Ofast is also more or less meaningless in a kernel context as it mainly deals with floating point optimizations.

                  Comment


                  • #10
                    Originally posted by Jannik2099 View Post

                    Unless they patched the kernel elsewhere, no they do NOT. The kernel Makefile would override any previous -O flag.

                    -Ofast is also more or less meaningless in a kernel context as it mainly deals with floating point optimizations.
                    GCC 12.1.

                    O2 vs Ofast:
                    Code:
                    +  -fallow-store-data-races            [enabled]
                    +  -fassociative-math                  [enabled]
                    +  -fcx-limited-range                  [enabled]
                    +  -ffinite-math-only                  [enabled]
                    +  -fgcse-after-reload                 [enabled]
                    +  -fipa-cp-clone                      [enabled]
                    +  -floop-interchange                  [enabled]
                    +  -floop-unroll-and-jam               [enabled]
                    +  -fmath-errno                        [disabled]
                    +  -fpeel-loops                        [enabled]
                    +  -fpredictive-commoning              [enabled]
                    +  -freciprocal-math                   [enabled]
                    +  -fsemantic-interposition            [disabled]
                    +  -fsigned-zeros                      [disabled]
                    +  -fsplit-loops                       [enabled]
                    +  -fsplit-paths                       [enabled]
                    +  -ftrapping-math                     [disabled]
                    +  -ftree-loop-distribution            [enabled]
                    +  -ftree-partial-pre                  [enabled]
                    +  -funroll-completely-grow-size       [enabled]
                    +  -funsafe-math-optimizations         [enabled]
                    +  -funswitch-loops                    [enabled]
                    +  -fversion-loops-for-strides         [enabled]
                    O2 vs O3:
                    Code:
                    +  -fgcse-after-reload                 [enabled]
                    +  -fipa-cp-clone                      [enabled]
                    +  -floop-interchange                  [enabled]
                    +  -floop-unroll-and-jam               [enabled]
                    +  -fpeel-loops                        [enabled]
                    +  -fpredictive-commoning              [enabled]
                    +  -fsplit-loops                       [enabled]
                    +  -fsplit-paths                       [enabled]
                    +  -ftree-loop-distribution            [enabled]
                    +  -ftree-partial-pre                  [enabled]
                    +  -funroll-completely-grow-size       [enabled]
                    +  -funswitch-loops                    [enabled]
                    +  -fversion-loops-for-strides         [enabled]
                    O3 vs Ofast:
                    Code:
                    +  -fallow-store-data-races            [enabled]
                    +  -fassociative-math                  [enabled]
                    +  -fcx-limited-range                  [enabled]
                    +  -ffinite-math-only                  [enabled]
                    +  -fmath-errno                        [disabled]
                    +  -freciprocal-math                   [enabled]
                    +  -fsemantic-interposition            [disabled]
                    +  -fsigned-zeros                      [disabled]
                    +  -ftrapping-math                     [disabled]
                    +  -funsafe-math-optimizations         [enabled]
                    Last edited by birdie; 23 June 2022, 07:55 AM.

                    Comment

                    Working...
                    X