Announcement

Collapse
No announcement yet.

Benchmarking The Linux Kernel With An "-O3" Optimized Build

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by NobodyXu View Post

    I speculate the result of context switching might be wrong.

    It gives such a big improvement that I suspect that it involves some bugs.
    For example, maybe it somehow removes the flushing/software migration for spectre/meltdown?
    This is a really good point, I imagine it might well do that, at least on some platforms.

    Comment


    • #42
      Originally posted by birdie View Post

      Just what I expected. A well written kernel must have next to zero impact on performance other than using the CPU intensive features the kernel itself provides, i.e. encryption, connections, context-switching, etc. which is not absolute most users ever deal with.

      Case closed.
      I'm going to point out that TLS and SSH are something many people are going to be using regularly. Admittedly, CPU load isn't usually what limits the speed for either SSH or downloading files/web pages, but still.

      EDIT: It was pointed out that these use their own encryption algorithm separate from the kernel. Please disregard this comment.
      Last edited by hamishmb; 30 June 2022, 05:04 AM.

      Comment


      • #43
        Originally posted by birdie View Post

        Good, compile your own kernel with -O3. Case closed. Fedora and me will continue to use -O2.

        Now on to your arguments:
        • If your kernel spends too much time swapping out/in using ZSWAP or you're a fan or ZRAM, you're lacking RAM anyways and your performance is hugely compromised regardless (of kernel compilation flags)
        • LUKS is irrelevant for absolute most users out there since it uses AES which has been HW accelerated in most CPU released over the past seven years. If your CPU doesn't HW accelerate AES instructions, you're fucked regardless - I've worked with LUKS on such PCs - it's a torture. OK, with -O3 you'll get something like 20MB/sec throughput, with -O2 you'll get 15MB/sec - both are terribly slow.
        • Wireguard - again, AES. Shouldn't register in top unless you send more than tens of megabytes of traffic per second which is valid for whom exactly? And if AES is HW accelerated -O3 or -O2 will mean nil.
        It's so bloody tiresome to see people continue to argue for -O3 without providing any results whatsoever. We've had over 150 comments for the past two discussions on O3 and Michael has been the only person to actually show the results.

        If you feel so confident, please, spend half an hour and show your results, OK? PLEASE.
        I have used LUKS and eCryptfs (both with AES) on some systems, one of which was an AMD A10-based laptop. I forget which CPU it was, but I know it was a weak dual core and it easily pulled 40MB/s.

        LUKS was even usable on a Raspberry Pi 1 model B+ I use as a NAS, albeit only being used for audio streaming to smart speakers and other basic file sharing. Note that I wouldn't recommend doing this though, it was annoying, but a Pi 1 is apparently comparable to a Pentium 2 speed-wise, so it's annoying generally regardless of what you're doing with it.

        Your points are valid of course, but it's not necessarily quite as dire as you make it sound, not having the CPU instructions.

        Comment


        • #44
          The benchmark I want to see is "number of hours to crash an -O3 kernel with fuzzing" versus -O2 etc.

          Comment


          • #45
            Oh, the other system without AES instructions was a Core 2 Duo (T7300 possibly). It was decent as well, good enough to daily drive.

            I'm kind of tempted to try some benchmarks with this on a spare Pi 1 model B I have to test the impact on the lowest of the low end (albeit not x86). Would anyone be interested in this?
            Last edited by hamishmb; 29 June 2022, 03:30 PM. Reason: Fixed another typo

            Comment


            • #46
              It's cool. Not groundbreaking, but cool.

              Comment


              • #47
                Originally posted by hamishmb View Post
                Oh, the other system without AES instructions was a Core 2 Duo (T7300 possibly). It was decent as well, good enough to daily drive.

                I'm kind of tempted to try some benchmarks with this on a spare Pi 1 model B I have to test the impact on the lowest of the low end (albeit not x86). Would anyone be interested in this?
                Of course; the slower the CPU the bigger the advantage. Anyway, tests against ARMv7 or ARMv8 CPUs would also be interesting as well.

                Comment


                • #48
                  I also have a semi-spare Pi 3, so I could do that too. I'm in need of some content for my blog and YouTube channel so I might just do this.

                  Comment


                  • #49
                    Originally posted by blackshard View Post

                    Of course; the slower the CPU the bigger the advantage. Anyway, tests against ARMv7 or ARMv8 CPUs would also be interesting as well.
                    Actually the opposite, your are bloating up code, fighting for more cache and then hopefully get the improvement from big fat parallel OOO execution.
                    smaller CPUs lack that, and are quite often slower with -O3.

                    What happened to the LTO effort btw, that certainly does slim down the kernel and should rarely cause issues.

                    Comment


                    • #50
                      Note: I may need some help with sensible test selection. I may test the Pi 3 with 64-bit Raspberry Pi OS for more diversity.

                      I'm fully aware it might take days, weeks, or even a month or two at worst in total to try to run a relatively all-around set of benchmarks. However, if anyone with experience with the Phoronix Test Suite is willing to make some suggestions for sensible benchmarks to try, it would be very appreciated - I don't really have much of an idea what to try. Bear in mind my Pi 1 has 256MB of RAM, and the Pi 3 has 512 MB.

                      EDIT: Regardless of whether things are improved or worsened, I'd be very interested to see the results. The Raspberry Pi kernel likely has optimisations for these particular CPUs, so I wonder whether it would be slower or quicker, bearing all the variables in mind.

                      EDIT 2: For anyone doubting how masochistic I'm willing to be for the sake of experimentation and "fun", see this blog post and video I made where I ran an x86 VM on a Pi 1: https://www.hamishmb.com/blog/damn-s...-raspberry-pi/ XD
                      Last edited by hamishmb; 29 June 2022, 04:06 PM.

                      Comment

                      Working...
                      X