Announcement

Collapse
No announcement yet.

"CC_OPTIMIZE_FOR_PERFORMANCE_O3" Performance Tunable Dropped In Linux 6.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    The -O2 religion vs the -O3 cult! Or is it vice versa?

    Disclaimer: I have only ever used KCFLAGS=" ...-O3"
    Code:
    $  time KCFLAGS="-march=native -msse2avx -pipe -O3" KCPPFLAGS="-march=native -msse2avx -pipe -O3" make -j$(( $(nproc) + 2 )) deb-pkg LOCALVERSION=-danglingpointer-zen3-optimised
    Last edited by DanglingPointer; 10 August 2022, 11:29 PM.

    Comment


    • #22
      So ridiculous for people to cry about this. The linux kernel ricers are at it again. Go ahead and add your own compiler flags if ya want, build with -O99999 for all anyone cares. Only noobs want this stuff.

      Comment


      • #23
        Originally posted by coder View Post
        I think this needs to be judged on the amount of effort. Show me anything else which can deliver a couple % gains by just flipping a switch.

        And I maintain that it's just a starting point. If some careful analysis is done to see where -O3 helps vs. hurts or which options & parameters implied by -O3 are actually delivering most of the gains, then there's even further potential for performance improvements.

        Or, we could just continue to compile like it's 1999 ...for the rest of time.
        1. There's no need to add this ugly fucking picture.
        2. You're free to compile the kernel with -O999 if you want.
        3. You're free to create your own distro where you compile everything with -O999.

        GCC developers themselves have said on multiple occasions that -O3 enables experimental optimization options which may or may not improve performance but surely will add bloat.

        This topic is not worth the electrons wasted on it.

        /Thread.

        Comment


        • #24
          Originally posted by mercster View Post
          So ridiculous for people to cry about this. The linux kernel ricers are at it again. Go ahead and add your own compiler flags if ya want, build with -O99999 for all anyone cares. Only noobs want this stuff.
          Its not about ricing. Squeezing out performance in software is not only about raising CPU freq or just corecount. The complexity of modern CPUs do need a compiler guided optimisation path. As often shown by Michael simply using O2 is not making use of the full potential of the architecture. In some cases software does not benefit. In the most cases at least a few percentages are noticeable.
          You might argue ..so what? Well so what is exactly the price differnce of two processors in the product lineup. Since we are way beyond single core high power throughout nuclear power reactors.
          Any compiler aligning the compiled code in way more efficient way is welcome. Just have a look at the vast difference of generic vs latest amd/Intel gen CPU instruction set and its the noobs not recogniseing that CPU flags are often ignored and telling everybody o2 for all is sufficient.
          Especially with all the upcoming e/p avx core CPUs. Where it is possible that having to much code in the wrong core seems faster at the first glance but due to some down clocking behaviour of the CPU it might not be the optimum sometimes even harming.
          Last edited by CochainComplex; 11 August 2022, 04:40 AM.

          Comment


          • #25
            Originally posted by birdie View Post

            1. There's no need to add this ugly fucking picture.
            2. You're free to compile the kernel with -O999 if you want.
            3. You're free to create your own distro where you compile everything with -O999.

            GCC developers themselves have said on multiple occasions that -O3 enables experimental optimization options which may or may not improve performance but surely will add bloat.

            This topic is not worth the electrons wasted on it.

            /Thread.
            Its wasted electrons to not make use oft optimization flags. Clear Linux is a prime example of production ready highly optimized stable distribution. Sure they are pushing a lot of efforts into tinkering with different compiler flags. But the advantages are obvious. It can be easily stated that clear Linux is around 5-15% faster in the most use cases even on AMD CPUs. Why this overwhelming conservatism?

            Comment


            • #26
              CochainComplex

              -O3 creates such bloated code it may thrash low-end CPUs with small L1/L2 caches and result in a much lower performance than -O2. Distros cannot fucking afford using compile options which benefit only their richest customers. Michael has only shown tests with top-end AMD and Intel CPUs flush with cache.

              Sometimes people become Re Tarded in their pursuit of getting maximum performance and they surely believe everyone and everyone around them is running Ryzen 9 5950X or Core i9 12900KS. That's not the case, never been one. People still run PCs with Sandy Bridge Celerons and Atoms with just 1-2MB L2 cache.

              And no, distros, will not create "special" "optimized" spins just for the richest. Too much effort, too dubious outcome. You're obsessed with performance? Use Clear Linux, Gentoo or even FreeBSD.

              Comment


              • #27
                thankfully w/ distributions like t2, you could always use optimize (the kernel) w/ O3 or even smarter: https://www.youtube.com/watch?v=xzBWo4FLfK8

                Comment


                • #28
                  Originally posted by birdie View Post
                  CochainComplex

                  -O3 creates such bloated code it may thrash low-end CPUs with small L1/L2 caches and result in a much lower performance than -O2. Distros cannot fucking afford using compile options which benefit only their richest customers. Michael has only shown tests with top-end AMD and Intel CPUs flush with cache.

                  Sometimes people become Re Tarded in their pursuit of getting maximum performance and they surely believe everyone and everyone around them is running Ryzen 9 5950X or Core i9 12900KS. That's not the case, never been one. People still run PCs with Sandy Bridge Celerons and Atoms with just 1-2MB L2 cache.

                  And no, distros, will not create "special" "optimized" spins just for the richest. Too much effort, too dubious outcome.
                  I totally agree with this points. It should not end up in a race where the distro only supports the latest top notch gear.
                  That was btw never implied by my post.

                  Again Clear Linux is a good example it requires a CPU Haswell or newer. Haswell was introduced roughly 10 years ago and besides it resembles the roughly the compilerflag x86_v2. Clear Linux has code Paths for AVX2 and AVX512. Which yes indeed is quite an effort but a logical granularity.

                  I also know that even a Haswell CPU can be costly for people not living in the "first world". So yes im with you - Linux should not be an exclusive playground for wealthy people. Open Source is also about keeping "old" stuff running.

                  But we need also to consider how much energy should be wasted just to be backwards compatible? As said before the x86 Landscape is already very diversified that an unique common denominator code is very unefficient. I would say that comparing a Haswell to a Tigerlake CPU is the same as plain 386 to the first x64 Pentium. And we did a lot of code separation in the past too. Even abbandoning the entire plattform.

                  We have three options.
                  Compiling everything against generic which will cover the latest HPC down to the first x64 CPUs from roughly 20 Years ago.
                  Or we need to have some separated spins
                  or cpu dispatched codepath.

                  Handtuning times for Code are more or less over. Generic software is getting too complex and the CPU's with all the featuresets are to diversified.

                  The later will blow up the packages but keeps the usage of the distro rather generic - this one extensively needs more autonomity for the compiler to make the right decission which implies that more agreessiv flags like o3/LTO/PGO etc need to be adopted and tested to make them stable for such endavours.
                  Btw this is what intel is doing with some certain effort. To be precise Gentoo too but I guess QA is better on the Clear Linux side.


                  Comment


                  • #29
                    Originally posted by birdie View Post
                    CochainComplex

                    -O3 creates such bloated code it may thrash low-end CPUs with small L1/L2 caches and result in a much lower performance than -O2. Distros cannot fucking afford using compile options which benefit only their richest customers. Michael has only shown tests with top-end AMD and Intel CPUs flush with cache.
                    This is not true at all - even -O2 enables a lot of optimizations, including vectorization. -O3 is more aggressive of course, but the difference between -O2 and -O3 is far smaller today than it was 5-10 years ago. There is no doubt that -O2/-O3 generate larger code than -Os, but the difference is not that large.

                    In general the additional performance from optimization more than makes up for any increased I-cache misses. And it's not like -O3 is a recent invention, it existed decades ago when caches were absolutely tiny. So any claims about old CPUs with small caches not being able to run -O3 code well is plain wrong.

                    Comment


                    • #30
                      Originally posted by PerformanceExpert View Post

                      This is not true at all - even -O2 enables a lot of optimizations, including vectorization. -O3 is more aggressive of course, but the difference between -O2 and -O3 is far smaller today than it was 5-10 years ago. There is no doubt that -O2/-O3 generate larger code than -Os, but the difference is not that large.

                      In general the additional performance from optimization more than makes up for any increased I-cache misses. And it's not like -O3 is a recent invention, it existed decades ago when caches were absolutely tiny. So any claims about old CPUs with small caches not being able to run -O3 code well is plain wrong.
                      I see a lot of speculation and zero test results. Sorry, I'm a simple person and that doesn't work with me

                      Comment

                      Working...
                      X