Announcement

Collapse
No announcement yet.

Benchmarking The Linux Kernel With An "-O3" Optimized Build

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by birdie View Post

    Maybe we see different results, but Michael has written: "When it came to the -O3 kernel build for other workloads like gaming/graphics, web browsing performance, and various creator workloads there was no measurable benefit from the -O3 kernel". I.e. absolutely not worth it considering all the possible bugs and regressions it may unearth.
    Indeed, it seems that we read the results differently. I also already use it with even more daring custom modifications of my x86/Makefile (deleting the -mno-avx and mno-sse2 option to let the autovectorizer do more in certain parts where it is allowed - and the Kernel still works on my setup).

    I think we have seen that in the case of context switching which is Kernel-sensitive, the benefits are huge. I didn't expect desktop/gaming benchmarks to be any different as you are GPU-bound anyway most of the time but it is nevertheless great to see that there are no serious regressions overall which means that you are basically better off with O3 as the default option nowadays. But as you know there are enough companies out there that want to squeeze every little percent out of their workloads and these results have demonstrated that they should put some work into its stability as these benefits can be had with a simple switch of a compiler option.

    Comment


    • #32
      Originally posted by ms178 View Post

      Indeed, it seems that we read the results differently. I also already use it with even more daring custom modifications of my x86/Makefile (deleting the -mno-avx and mno-sse2 option to let the autovectorizer do more in certain parts where it is allowed - and the Kernel still works on my setup).

      I think we have seen that in the case of context switching which is Kernel-sensitive, the benefits are huge. I didn't expect desktop/gaming benchmarks to be any different as you are GPU-bound anyway most of the time but it is nevertheless great to see that there are no serious regressions overall which means that you are basically better off with O3 as the default option nowadays. But as you know there are enough companies out there that want to squeeze every little percent out of their workloads and these results have demonstrated that they should put some work into its stability as these benefits can be had with a simple switch of a compiler option.
      People who wanted to use O3, have been doing so for over a decade if not two. People here are talking as if -O3 was made available yesterday, it's super cool, new and important. It's none of that. It's for special use cases and risky people. It's not for everyone and all situations no matter what you say.

      Speaking "I think we have seen that in the case of context switching which is Kernel-sensitive, the benefits are huge" - OMG, exactly a single test out of 250 benefits from it. So much performance lost. Not. And that test is nearly 100% artificial.

      Again, where are your spectacular test results or you're one of those who love to render the air?

      Comment


      • #33
        Originally posted by NobodyXu View Post

        I speculate the result of context switching might be wrong.

        It gives such a big improvement that I suspect that it involves some bugs.
        For example, maybe it somehow removes the flushing/software migration for spectre/meltdown?
        Might be, might not be. I highly doubt that though.

        Everything is information. So are other compilation flags/optimizations.
        Saying that O3 has no value seems very counterintuitive to me.
        O2 could be hiding stuff that more aggressive optimization would uncover too.

        Of all the stupid things the kernel has accepted/done throughout the years,
        exposing O3 through experimental tagged kconfig/lxdialog seems like the least of them.

        I think this whole thing is making an issue of something that really isn't.

        Comment


        • #34
          Originally posted by stormcrow View Post

          It's dangerous to accept benchmarks at face value when there's a drastic change in performance. As another commenter mentioned, -O3 could be optimizing away certain Spectre mitigations in undesirable ways in the kernel. Unless you know for sure what -O3 actually does (and from your post I'd guess you don't and probably don't know how to analyze machine code - I don't either, but I do know what you write in code isn't always what the compiler and linker tells the system to do.) on the processor at execution it should be assumed something broke and there should be an investigation of why gcc -O3 is doing what it's doing.

          This is one reason we could use a comprehensive test suite for the Linux kernel that runs through known exploit and bug conditions.
          Different Spectre and Meltdown mitigations aren't magical nor difficult.
          O3 is used for millions and millions of lines of code without any issue whatsoever.

          Doesn't mean that O3 gets it right. Doesn't mean that O2 does either.
          O2 can hide stupid things that more aggressive optimization uncovers.

          I don't think it's an issue to accept O3 as an experimental, simple flag to use.
          Otoh, KCFLAGS could easily be used as well.

          Either way. This discussion is way out of proportion for something trivial as a generic optimization flag marked as EXPERIMENTAL.

          Comment


          • #35
            Personally, I'll keep breathing 02 and let O3 take care of the UV.

            Comment


            • #36
              Originally posted by birdie View Post

              People who wanted to use O3, have been doing so for over a decade if not two. People here are talking as if -O3 was made available yesterday, it's super cool, new and important. It's none of that. It's for special use cases and risky people.
              While that quoted part is partly true, I would challenge the "special use case and risky people" part and chose to ignore the rest as we clearly read the results too differently to argue about it. You won't use it in embedded systems or where your live depends on 100% stability obviously (just as I don't use security or other performance-costing debugging features Fedora likes to enable in their Kernel). I haven't seen real downsides in the data, that is good enough for me - people just claim that it could cause problems, the data doesn't support that claim on a wide variety of workloads though.

              Having upstream buy-in is also not equivalent to let people do their own thing if they dare. If at all it has been shown with data that it should be a use case that should be cared about by upstream developers in terms of more effort (buildbots, systematic bug fixes on compiler and Kernel side if they appear). Your position shifts that burden on the companies that want to use it, while I argue that such work would be better suited for upstream development as the benefits make them worthwile for a wider audience.

              Comment


              • #37
                Originally posted by MadCatX View Post
                I'm honestly surprised to see that there were any benefits at all. Some of them even looked compelling for large web service providers.
                The results could also be due to different cpu turbo behaviour (due to different code layout/opcodes used); unless Michael disabled cpu frequency scaling and set all cores to the same (low, to avoid thermal throttling) clock rate, which wasn't mentioned in the article, the results are meaningless, especially as others have already mentioned, when you test things that spend most of their time in userspace.

                Comment


                • #38
                  Originally posted by birdie View Post

                  Yeah, no one here remembers that -O3 makes the resulting code bloated as hell, which could very well mean regressions in performance on CPUs with smaller caches and there are tons of people running them. E.g. Celeron G1850 which has just 2MB of L3 cache, vs 12600K (tested in this article) which has 20MB of it.

                  But no, "You're depriving us of performance!!!!"
                  You got my point precisely (BTW: I am one of them - on a Core 2 Duo which has 4MB of L2 cache).
                  Also worth mentioning is that performance and latency is surprisingly NOT the same thing which people tend to forget again and again.

                  And while I am not sure how well GCC does analyze things, it is obvious that GCC can't take any coders idea into consideration. I would not at all be surprised if you in some situations could (in worst case) end up have the total runtime (a far more important metric) of a function increase, as a consequence of improving a loop elsewhere in that function optimized - especially with aggressive optimizations like -O3

                  http://www.dirtcellar.net

                  Comment


                  • #39
                    Originally posted by rene View Post

                    not really, because the kernel doe not use any SIMD, so -march=native does next to nothing - mostly insn scheduling, ... ;-)
                    Do you know of any areas in the Linux kernel (apart from, say, encryption and checksumming) where SIMD could make an actual difference in terms of performance (for no downside)?

                    Another poster mentioned letting the autovectorizer attempt to optimise by removing -mno-sse2 and -mno-avx. What are the potential downsides to that?
                    Last edited by ermo; 01 July 2022, 06:12 PM.

                    Comment


                    • #40
                      Originally posted by ermo View Post

                      Do you know of any areas in the Linux kernel (apart from, say, encryption and checksumming) where SIMD could make an actual difference in terms of performance (for no downside)?

                      Another mentioned letting the autovectorizer attempt to optimise by removing -mno-sse2 and -mno-avx. What are the potential downsides to that?
                      Well, there is always some frame buffer (...) or network code where some wide SIMD load stores can potentially speed things up.
                      Deleting -mno-simd* will however lead to random SIMD register (SSE, AVX, AVX512) register file clobbering and corruption as the kernel does not save and restore those registers during user-kernel context switch, which is obviously why it is disabled. SIMD is only enabled and used in explicit SIMD algorithm sections, like the encryption and checksumming you mentioned for this reasons.

                      Comment

                      Working...
                      X