Announcement

Collapse
No announcement yet.

"CC_OPTIMIZE_FOR_PERFORMANCE_O3" Performance Tunable Dropped In Linux 6.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dekernel
    replied
    Originally posted by CochainComplex View Post

    Reminds me of the conservative racecar engineer telling that speed above 150km/h in corner xyz is not save because...techyadayada. But after 10 rounds petrolhead "no brainer" testdriver jumps out of the car shouting: did you see? I beat the 160km/h mark in the corner?! wuhhuu that was fing awesome.
    I guess we have the same here too.
    As someone who actually has been on the race track, I find your comment humorsome because I saw countless people trying to win the race by being the fastest in every turn, and can you guess who typically was in the weeds when the checker flag was waving? When it comes to your OS kernel, I am a big fan of safety first, but if you want to try and be the fastest all the time, by all means go for it. Just don't try to drag all of us into your world.
    Last edited by dekernel; 11 August 2022, 08:35 AM.

    Leave a comment:


  • binarybanana
    replied
    Originally posted by coder View Post
    Regarding your quoted results: is the first set reporting the time to complete a fixed workload and the second set reporting the speed achieved? In other words, "lower is better", in the first test, but "higher is better" in the second?
    Yes. One test measures throughput, the other measures time to finish.
    BTW, I'm currently still running the Ofast kernel I tested last because I was too lazy to recompile and reboot again. So far no issues. Not that I would recommend it or anything, but no nasal demons have come out of my nose yet.
    Last edited by binarybanana; 11 August 2022, 07:01 AM.

    Leave a comment:


  • CochainComplex
    replied
    Originally posted by birdie View Post
    I see there's a cult of -O3 here: you, guys, are welcome to use Clear Linux, Gentoo, FreeBSD or create an Ubuntu SUPER DUPER FAST SPIN compiled with -O3 -march=zen3 -flto -pgo since I presume most people here are rabid AMD fans.

    Oh and make sure you've not missed other experimental GCC flags. Every optimization matters, why stop at -O3?
    have a look at the Clear Linux repo - Ofast is the next "big" thing

    e.g. your beloved Wayland
    https://github.com/clearlinux-pkgs/w...n/wayland.spec

    p.s.:...nitpicking but PGO does not work by just adding the "-pgo"-Flag
    Last edited by CochainComplex; 11 August 2022, 07:01 AM.

    Leave a comment:


  • marios
    replied
    Originally posted by DanglingPointer View Post
    The -O2 religion vs the -O3 cult! Or is it vice versa?

    Disclaimer: I have only ever used KCFLAGS=" ...-O3"
    Code:
    $ time KCFLAGS="-march=native -msse2avx -pipe -O3" KCPPFLAGS="-march=native -msse2avx -pipe -O3" make -j$(( $(nproc) + 2 )) deb-pkg LOCALVERSION=-danglingpointer-zen3-optimised
    Neither, it is the -O2 metaphysics vs the -O3 dialectic...

    Leave a comment:


  • birdie
    replied
    I see there's a cult of -O3 here: you, guys, are welcome to use Clear Linux, Gentoo, FreeBSD or create an Ubuntu SUPER DUPER FAST SPIN compiled with -O3 -march=zen3 -flto -pgo since I presume most people here are rabid AMD fans.

    Oh and make sure you've not missed other experimental GCC flags. Every optimization matters, why stop at -O3?

    Leave a comment:


  • coder
    replied
    Originally posted by CochainComplex View Post
    Haswell was introduced roughly 10 years ago and besides it resembles the roughly the compilerflag x86_v2.
    Haswell is the minimum needed for v3. The v2 CPUs were Westmere-era, introduced around 2009.

    BTW, the whole tangent about Clear Linux is a waste of time. Its only relevance to the discussion is in showing that -O3 works and doesn't compromise stability.
    Last edited by coder; 11 August 2022, 06:11 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by birdie View Post
    CochainComplex

    -O3 creates such bloated code it may thrash low-end CPUs with small L1/L2 caches and result in a much lower performance than -O2.
    Citation needed.

    Leave a comment:


  • coder
    replied
    Originally posted by birdie View Post
    1. There's no need to add this ugly fucking picture.
    Did it hit too close to home? Maybe a relative of yours, or it shows you in an unflattering light?

    Originally posted by birdie View Post
    2. You're free to compile the kernel with -O999 if you want.
    3. You're free to create your own distro where you compile everything with -O999.
    And you're free to completely miss my point, which is that the lore around -O3 being risky is dated and misinformed. And it's really this misinformation I'm trying to combat, rather than advocating for this config option to stick around.

    Originally posted by birdie View Post
    GCC developers themselves have said on multiple occasions that -O3 enables experimental optimization options which may or may not improve performance but surely will add bloat.
    They don't put things in it which never improve performance. There are some performance options that aren't added to -O3.

    Originally posted by birdie View Post
    This topic is not worth the electrons wasted on it.
    You're free not to contribute yours.

    Originally posted by birdie View Post
    /Thread.
    Alright, then I expect this will be your last post in it.

    Leave a comment:


  • PerformanceExpert
    replied
    Originally posted by birdie View Post

    I see a lot of speculation and zero test results. Sorry, I'm a simple person and that doesn't work with me
    Indeed, it's total speculation that -O3 is worse than -O2. Extraordinary claims that go against decades of experience require extraordinary evidence...

    Leave a comment:


  • birdie
    replied
    Originally posted by PerformanceExpert View Post

    This is not true at all - even -O2 enables a lot of optimizations, including vectorization. -O3 is more aggressive of course, but the difference between -O2 and -O3 is far smaller today than it was 5-10 years ago. There is no doubt that -O2/-O3 generate larger code than -Os, but the difference is not that large.

    In general the additional performance from optimization more than makes up for any increased I-cache misses. And it's not like -O3 is a recent invention, it existed decades ago when caches were absolutely tiny. So any claims about old CPUs with small caches not being able to run -O3 code well is plain wrong.
    I see a lot of speculation and zero test results. Sorry, I'm a simple person and that doesn't work with me

    Leave a comment:

Working...
X