Announcement

Collapse
No announcement yet.

Don't Look For Gentoo's CPU Optimization Options To Land In The Mainline Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by milkylainen View Post

    It's as its always been with the kernel. No real performance testing or performance regression handling. It's always been more of the "oops, this was bad" and "oh, we fixed this and now it's faster.". Some sporadic testing do occur. But then it's always contended because of this and that.
    This is also how endless crap (yes, asinine spectre and meltdown handling) makes it into the kernel without anyone as much as raising an eyebrow.

    A kernel like the Linux kernel should have a qualified performance test suite. An binary that can be built from the kernel tree which can be loaded instead of init to measure the kernel in an undisturbed fashion. Or whatever really.
    I fully concur! I don't know the current state of the various Linux Kernel performance testing efforts, but at least Intel did some work in that area: https://01.org/blogs/jdu1/2017/lkp-t...-analysis-tool

    And I am sure that more could be done there to ensure that no major regressions are introduced.

    Comment


    • #22
      Running Gentoo. I add -O3 since gcc 7, now on 8.2. Have had no issues at all. The risks of bugs were a lot bigger back in the gcc 2.x days. Things have processed quite a bit since then.

      In the kernel source tree, run this:
      Code:
      find -name Makefile -type f -print0 | xargs -0 sed -i 's/-O2/-O3/g'
      Last edited by S.Pam; 24 February 2019, 07:15 AM.

      Comment


      • #23
        Why not choosing the 3,4 most promissing configuration and check the Phoronix Test-Suite with each of them? One should clearly see an impact there?

        Comment


        • #24
          Originally posted by pmorph View Post
          This would probably have a negative effect on kernel quality and/or development time, already by the added layer of things to consider/test when bug reports are being analyzed. And there would be angry people.. who can't understand why a bug they reported using an exotic combination of optimizations gets so little attention.
          It adds nothing in terms of things to consider. The kernel already supports optimizing specifically for AMD Opteron/K8, Intel P4 and Atom. CPU specific optimization is already a factor.

          The "development time" it takes to add support for march=native would be something like one minute. That's a non-issue. Bug reports is also a non-issue: "Can you reproduce with a kernel compiled with Generic-x86-64?" - and that's probably the case. I've ran into quite a lot of kernel bugs and it's never mattered. Keep in mind that nobody has proposed adding a metric ton of funroll-loops special ricer flags. Nobody has proposed using O3 either. It's just about using -march=znver1 if you have a Ryzen or -march=native if you'd like autodetection of your march=. That's not exotic. It's harmless.

          Comment


          • #25
            So, what would be a good set of benchmarks to perform? I have an Edison running 4.20.0 kernel (32 and 64 bit), and being a Silvermont Atom it is quite sensitive to the generated code. I would say benchmarks targeting disk, usb, network, compression and/or encryption?

            Comment


            • #26
              I like how the mods (or Michael?) remove posts that call kernel maintainers idiots, while users can call each other that without issue. Truth must hurt to have to overprotect them eh?

              Comment


              • #27
                Originally posted by atomsymbol

                Isn't most of kernel code pointer chasing where AVX/AVX2/AVX512/BMI/POPCNT/SSE3+/F16C/MMXEXT have basically zero chance to match the code pattern? The Linux kernel isn't itself performing matrix multiplication, imagine filtering, mp3/ogg decoding, translucent 2D rendering, etc. The only fields where advanced instruction sets matter seems to be in-kernel cryptography and filesystem (de)compression, although for the latter I believe special compression instructions are planned by Intel in their future CPUs and aren't available in today's CPUs.
                The -march option isn't just about SIMD operations or the newer instructions. It specifies specific features of that μarch so that issues can be avoided. If an instruction has a false dependency for the source register on some older μarchs then the compiler will reorder the instructions or insert a NOP to improve performance on those μarchs. Or if an architecture has stall issues with partial register update then some workaround will be used. OTOH if information about the cache, μop buffer size, decoder properties... is known then it'll have proper tactics for those, like aligning instructions, reorder the branches...

                Some famous examples are shr reg, imm8, adc reg, 0 or sbb reg, reg on x86 which have a fast special case, a dependency on the flags, a partial register update problem with 8/16-bit operations or some combinations of them on some μarchs. Or popcnt which has a false dependency on Sandy Bridge and Ivy Bridge. Or lea which has different performance depending on the format and μarch. Therefore a multiplication by some constant can be faster with lea or mul depending on which target you're compiling for
                Last edited by phuclv; 24 February 2019, 12:10 PM.

                Comment


                • #28
                  "GCC changes too" and then "exotic options" and "sane defaults". ALL options are part of GCC, not just the "sane defaults". Just as exotic settings can change, so can the "sane defaults", and cause regressions, or whatever. Get over it.

                  Go on, remove this post as well, can't show how superficially emotional subjective "don't argue with me" argument a kernel maintainer has eh? Oh noes.

                  Comment


                  • #29
                    6 years since that reference by the kernel dev's, and the argument is exactly the same. It is a piss easy patch-set to "maintain", cos it does actually not require any maintenance at all.. Yet, having the option to fuck things up is "denied", but there are plenty of options to totally mess up that have absolutely no viable "documentation" other than the likes of "This option should be used on pentium or newer".

                    Probably best to leave it alone, incase someone with gcc 1.0 is going to build the 5.0 kernel on a 486 dx2! There is a option such as - safe - and then there is "safe", where everything is just stupid. Not implementing something like this belongs in the "safe" category

                    Comment


                    • #30
                      Running some benchmarks with this patch...
                      Michael Larabel
                      https://www.michaellarabel.com/

                      Comment

                      Working...
                      X