Announcement

Collapse
No announcement yet.

Benchmarking The Linux 5.19 Kernel Built With "-O3 -march=native"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Barley9432 View Post
    And the entire benefit of Gentoo gone out the window with a single article... nice
    LOL. Reminds me of 15+ years ago seeing Gentoo users building with literally a page full of OMG optimized flags and spending 2 days building on their 486's with zero noticeable difference.

    Comment


    • #32
      Originally posted by piotrj3 View Post
      This is truly suprising!
      Except it is not at all.

      Comment


      • #33
        First, why gcc 11 on 12 gen processor? Literally quoting phoronix itself from few months ago "GCC 11 as the stable compiler introduced earlier this year there was the initial Intel "alderlake" target. However, that initial implementation was carrying the exisiting Ice Lake cost table that was not tuned for Alder Lake processors that launched last month. Merged for GCC 12 is that tuned Alder Lake support in place for those compiling binaries specifically using the "-march=alderlake" option."

        Secondly, afaik kernel have build in flags -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx etc. to prevent gcc from unexpected optimizations, so dunno if simple kcflag would somehow overwrite those. And if not then there wont be much difference between native and x86-64.

        Comment


        • #34
          Originally posted by mercuriete View Post
          As a Gentoo user I wanted to say ...
          The time expend on kernel is very little compared with userspace.
          This article didn't benchmark boot time where kernel is run 100% of the time (after firmware/BIOS and before init)
          In my personal experience I gain one or two seconds of boot time after enabling -march=native on the kernel and -O2.
          I need to redo my test again but at that time for me was very clear the gain using systemd-analyze.

          TLDR;
          boot time is better when --march=native on my tests.
          Zero difference for me and I've been running my custom built kernel ever since I started using Linux. The fact that Fedora insist on using XZ to compress modules doesn't help either. XZ is dog slow to decompress vs. e.g. ZSTD or even GZIP.

          The kernel itself completely loads in less than a second.

          Comment


          • #35
            Originally posted by brad0 View Post

            Except it is not at all.
            To me it is .. I'd have expected that the usage of all the additional instruction set extensions which came out since the original K8 was designed (bmi1/2, movbe in particular) had a more positive impact. Or the gcc tuning model for alder lake is simply garbage.
            As someone else already commented, maybe rerun this test on icelake or one of the skylake derivatives and maybe a zen3, so the picture becomes a bit clearer.

            Comment


            • #36
              Originally posted by mlau View Post

              To me it is .. I'd have expected that the usage of all the additional instruction set extensions which came out since the original K8 was designed (bmi1/2, movbe in particular) had a more positive impact. Or the gcc tuning model for alder lake is simply garbage.
              As someone else already commented, maybe rerun this test on icelake or one of the skylake derivatives and maybe a zen3, so the picture becomes a bit clearer.
              A lot of delusions. People seem to believe a lot of foolish nonsense.

              Comment


              • #37
                I would suggest a -O2 -march=native benchmark aswell if that thing tops -O2 fine, if it tops on -O3 aswell even better and you could also go the other way and do -Os -march=native.

                And i can also explain some of the benchmarks probably the kernel hogs registers from mmx/sse/avx, and the handcrafted code from some of the userspace programms wants those registers free and the userspace looses in that cases, while the kernel time is better for IO/MMU/Sheduler.
                Last edited by erniv2; 13 July 2022, 01:42 PM.

                Comment


                • #38
                  Originally posted by Michael View Post

                  Yep it's all open source. People are lazy?
                  Yeah, can't lie about that, but in this case I don't have access to an x86 CPU with low and high power cores to test the theory that it could be less-than-stellar compiler optimizations with that kind of architecture.

                  Comment


                  • #39
                    Maybe the problem is that the kernel isn't the sole purpose of the machine. If you optimize too much the kernel (say, it now can use an additional variable in registers instead of the stack), now that register can't be used in the next instruction and it needs more push/pop to free it. But I ain't an expert, so what do I know

                    Comment


                    • #40
                      Originally posted by Anux View Post
                      Do all Gentoo users have the same i5-12600K CPU or how did you come to that conclusion?

                      You know that GCC with "nativ" can only optimize for one CPU core and one cache size? This CPU has different cores with different amount of cache so its gonna run bad on one or the other core.


                      Not shure what a bug report should do, I know no way to put 2 different binarys in one and than run them on the corresponding core. The easyest fix is, don't use native on such a CPU.
                      Great comment.
                      I guess phoronix needs to retest with e cores disabled.

                      It is also possible that the gcc native check went to e cores so the kernel was compiled with optimization for e core instead of p core.
                      Last edited by zamroni111; 13 July 2022, 07:01 PM.

                      Comment

                      Working...
                      X