Announcement

Collapse
No announcement yet.

Benchmarking The Linux Kernel With An "-O3" Optimized Build

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    I am currently vividly picturing an image of Torvalds making Principal Skinner's "PATHETIC" face while looking down at these benchmarks.

    Comment


    • #52
      What was the difference in file size for the resulting kernel and/or initramfs between these two levels?

      Comment


      • #53
        Originally posted by milkylainen View Post
        I think the results should be interpreted otherwise.
        It's not strange that you don't see any benefits from something that spends < 1% cpu time in kernelspace.
        If you're measuring an entire system then yeah, perhaps not that much difference.

        But as the synthetic tests measuring syscalls or actual kernel operations (like context switching)...
        Then wow did that O3 flag mean improvement!

        In summary: It's unfair to say that it doesn't help the kernel when measuring an entire system.
        Exactly!

        And if applications benefiting are databases and web servers then just a simple optimization could lead to a significant reductions in servers and power dissipation. It's low hanging fruit.

        Also: if there are bugs in gcc then let's resolve them. And if not a bug in gcc there surely kernel code that needs improvement. Since when is the OSS community afraid of improvement?

        Comment


        • #54
          Originally posted by birdie View Post

          Just what I expected. A well written kernel must have next to zero impact on performance other than using the CPU intensive features the kernel itself provides, i.e. encryption, connections, context-switching, etc. which is not absolute most users ever deal with.

          Case closed.
          Thats's BS. A will written kernel will not be affected by a compiler optimization flag. It would be affected by compiler bugs, but that is true in general. If ggc had more bugs then clang would we prohibit gcc or would we fix the bugs?

          Comment


          • #55
            As I suspected. Essentially no improvement for a lot of benchmarks and very thin improvements for the rest.

            Comment


            • #56
              Originally posted by ms178 View Post
              I hope this set of benchmarks convince Linus that the effort is worth it. There were clearly benefits for some workloads and no showstoppers. Weeding out the compiler/kernel bugs for O3 is a worthy effort in my eyes, even more so now than before.
              If I were Linus it would show me the opposite.

              Comment


              • #57
                Originally posted by stormcrow View Post

                It's dangerous to accept benchmarks at face value when there's a drastic change in performance. As another commenter mentioned, -O3 could be optimizing away certain Spectre mitigations in undesirable ways in the kernel. Unless you know for sure what -O3 actually does (and from your post I'd guess you don't and probably don't know how to analyze machine code - I don't either, but I do know what you write in code isn't always what the compiler and linker tells the system to do.) on the processor at execution it should be assumed something broke and there should be an investigation of why gcc -O3 is doing what it's doing.

                This is one reason we could use a comprehensive test suite for the Linux kernel that runs through known exploit and bug conditions.
                If the mitigations are optimized away, they need improvement.

                Your argumentation is flawed, -O2 does lots of optimizations, and you don't know what those are either. And no doubt you _could_ analyze them, but the problem is CPU's don't handle machine code in the same way. Notably, there is a large difference between atom processors and core-ix. Likely also between brands. Optimizations for the one may degrade performance on others (but I won't bore you with examples).

                If the compiler and linker don't behave as defined - that's a bug. If it doesn't do what you expect - then you have a chance to learn something new (checkout Duff's device). Else - fix the code.

                Comment


                • #58
                  Originally posted by hamishmb View Post

                  I'm going to point out that TLS and SSH are something many people are going to be using regularly. Admittedly, CPU load isnt usually what limits the speed for either SSH or downloading files/web pages, but still.
                  Neither TLS nor SSH uses the kernel encryption routines, they both use their own user space versions so will have "zero impact" from how the kernel is compiled. The reason why e.g PostgreSQL benefits here is that it calls a ton of syscalls for each request.

                  Comment


                  • #59
                    Originally posted by birdie View Post

                    Just what I expected. A well written kernel must have next to zero impact on performance other than using the CPU intensive features the kernel itself provides, i.e. encryption, connections, context-switching, etc. which is not absolute most users ever deal with.

                    Case closed.
                    This has very little to do with "a well written kernel" (unless we are going to argue that a micro kernel [which suffers way way more from the kernel overhead] is not a well written kernel) but more to do with how much of the kernel infrastructure that your application needs/uses. This is why something like Redis that is single threaded and only serve data from RAM have close to zero benefits since it involves the kernel in very few instances vs PostgreSQL that have to manage databases many times the size of RAM, tens of thousands more parallel users than Redis and therefore use the kernel a lot in it's request path.
                    Last edited by F.Ultra; 29 June 2022, 07:20 PM.

                    Comment


                    • #60
                      phoronix Thanks for benchmarking this!

                      I wonder if some people here even looked at the test results - the only thing that had massive improvements was this one context switching benchmark - which might be buggy for all we know, certainly worth investigating. Maybe the benchmark is buggy, maybe the code is buggy and breaks from optimization, or everything was correct and optimizing the code by hand so the compiler hopefully generates similarly fast code with -O2 might be worthwhile (even though in real-world code this obviously doesn't make as much of as a difference, as all the other benchmarks showed). As nice as scenario 3 (possibility of optimizing the context switching code) would be, as the ctx_clock benchmark (which is also a very synthetic context switching benchmark) did not show any improvement (if anything, there was a 0.5% decline), scenario 1 (buggy test) seems more likely..

                      Wireguard and Cryptsetup encryption/decryption *were* tested and showed very little improvement.
                      Wireguard: https://www.phoronix.com/scan.php?pa...ernel-o3&num=4 (< 1.5%)
                      Cryptsetup: https://www.phoronix.com/scan.php?pa...ernel-o3&num=5 (< 0.8%)
                      (Note that the Cryptsetup benchmarks include ones with Serpent which, unlike AES, is not explicitly hardware-accelerated)
                      Last edited by DanielG; 29 June 2022, 08:26 PM.

                      Comment

                      Working...
                      X