Announcement

**schmidtbag** · 29 June 2022, 11:33 AM

Having no experience in customizing a kernel build, this might seem like a dumb question but: is there a way to use -O3 for specific files, rather than the whole thing? Seems to me in most cases, -O3 is not worth the risk, but there are some situations where there is a significant improvement.

**Volta** · 29 June 2022, 11:41 AM

Originally posted by birdie View Post

Just what I expected. A well written kernel must have next to zero impact on performance other than using the CPU intensive features the kernel itself provides, i.e. encryption, connections, context-switching, etc. which is not absolute most users ever deal with.

It also depends what parts are optimized 'by hand'.

**stormcrow** · 29 June 2022, 11:56 AM

Originally posted by milkylainen View Post

I think the results should be interpreted otherwise.
It's not strange that you don't see any benefits from something that spends < 1% cpu time in kernelspace.
If you're measuring an entire system then yeah, perhaps not that much difference.

But as the synthetic tests measuring syscalls or actual kernel operations (like context switching)...
Then wow did that O3 flag mean improvement!

In summary: It's unfair to say that it doesn't help the kernel when measuring an entire system.

It's dangerous to accept benchmarks at face value when there's a drastic change in performance. As another commenter mentioned, -O3 could be optimizing away certain Spectre mitigations in undesirable ways in the kernel. Unless you know for sure what -O3 actually does (and from your post I'd guess you don't and probably don't know how to analyze machine code - I don't either, but I do know what you write in code isn't always what the compiler and linker tells the system to do.) on the processor at execution it should be assumed something broke and there should be an investigation of why gcc -O3 is doing what it's doing.

This is one reason we could use a comprehensive test suite for the Linux kernel that runs through known exploit and bug conditions.

**waxhead** · 29 June 2022, 12:15 PM

Originally posted by schmidtbag View Post

Having no experience in customizing a kernel build, this might seem like a dumb question but: is there a way to use -O3 for specific files, rather than the whole thing? Seems to me in most cases, -O3 is not worth the risk, but there are some situations where there is a significant improvement.

Not sure how the kernel build system works , but in theory it should be possible to pre-compile bits and pieces.

I also agree with you that -03 might not be worth the risk. Apparently nobody arguing here have looked up what the difference between -O2 and O3 is.
Well it is all described here : https://gcc.gnu.org/onlinedocs/gcc/O...timize-Options
And O3 equals all the switches enabled by O2 (and therefore O1 as well) plus the following

-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-fsplit-paths
-ftree-loop-distribution
-ftree-partial-pre
-funswitch-loops
-fvect-cost-model=dynamic
-fversion-loops-for-strides

I am no expert, but I find the -ftree-loop-distribution the be worrying as this may be an excellent catalyst for triggering race conditions that that kind of stuff. Sure , if it is a bug - it is good to find it , but race conditions are usually not easy to track down.
Also the -fipa-cp-clone can (according to the manual) significantly increase code size which , depending on the code being run - can cause things to no longer fit in the lower level CPU caches, but what freaks me out the most is the -fpredictive-commoning flag which sounds like a red flag especially if some tired coder forgot to declare a variable volatile while he was working on a driver.

Sure all of these things are bugs that should be caught and squashed at some point , but is it really worth the risk for potentially slightly more performance? Most of what the CPU should do should happen in userspace anyway and the kernel should try it's best to get the hell out of the way as much as possible.

For me it all sounds like trying to optimize your bathroom routine by focusing all your energy on how to speed up the time needed to unscrew the cap of your toothpaste.

**ms178** · 29 June 2022, 12:26 PM

I hope this set of benchmarks convince Linus that the effort is worth it. There were clearly benefits for some workloads and no showstoppers. Weeding out the compiler/kernel bugs for O3 is a worthy effort in my eyes, even more so now than before.

**CochainComplex** · 29 June 2022, 12:26 PM

Originally posted by waxhead View Post

Sounds like trying to optimize your bathroom routine by focusing all your energy on how to speed up the time needed to unscrew the cap of your toothpaste.

Well if it is just adding a flag. Its not "all" your energy. Im using the xanmod kernel built with -O3 and more for gaming. This are a few ℅ more for free. Adds up with other optimisations down the line. On my family nas with all the pics I want to keep I'm a bit more conservative and don't add any optimisations at all.

**garand43** · 29 June 2022, 12:35 PM

How much would compiling the entire stack (glibc, libstdc++, etc. and the applications used with -O3 affect the outcome?

**birdie** · 29 June 2022, 12:43 PM

Originally posted by binarybanana View Post

Disagree. A lot of users use zram/zswap and/or LUKS, maybe something like Wireguard, so any gains in that area are a big win. Sadly, none of those were tested here, but my guess is O3 might help for those workloads. In that case it would make sense to optimize those parts more aggressively than the rest. The only reason it can't be easily done is that the bernel build system doesn't support it. It's all or nothing.

Good, compile your own kernel with -O3. Case closed. Fedora and me will continue to use -O2.

Now on to your arguments:

If your kernel spends too much time swapping out/in using ZSWAP or you're a fan or ZRAM, you're lacking RAM anyways and your performance is hugely compromised regardless (of kernel compilation flags)
LUKS is irrelevant for absolute most users out there since it uses AES which has been HW accelerated in most CPU released over the past seven years. If your CPU doesn't HW accelerate AES instructions, you're fucked regardless - I've worked with LUKS on such PCs - it's a torture. OK, with -O3 you'll get something like 20MB/sec throughput, with -O2 you'll get 15MB/sec - both are terribly slow.
Wireguard - again, AES. Shouldn't register in top unless you send more than tens of megabytes of traffic per second which is valid for whom exactly? And if AES is HW accelerated -O3 or -O2 will mean nil.

It's so bloody tiresome to see people continue to argue for -O3 without providing any results whatsoever. We've had over 150 comments for the past two discussions on O3 and Michael has been the only person to actually show the results.

If you feel so confident, please, spend half an hour and show your results, OK? PLEASE.

**birdie** · 29 June 2022, 12:45 PM

Originally posted by ms178 View Post

I hope this set of benchmarks convince Linus that the effort is worth it. There were clearly benefits for some workloads and no showstoppers. Weeding out the compiler/kernel bugs for O3 is a worthy effort in my eyes, even more so now than before.

Maybe we see different results, but Michael has written: "When it came to the -O3 kernel build for other workloads like gaming/graphics, web browsing performance, and various creator workloads there was no measurable benefit from the -O3 kernel". I.e. absolutely not worth it considering all the possible bugs and regressions it may unearth.

**birdie** · 29 June 2022, 12:50 PM

Originally posted by waxhead View Post

Not sure how the kernel build system works , but in theory it should be possible to pre-compile bits and pieces.

Yeah, no one here remembers that -O3 makes the resulting code bloated as hell, which could very well mean regressions in performance on CPUs with smaller caches and there are tons of people running them. E.g. Celeron G1850 which has just 2MB of L3 cache, vs 12600K (tested in this article) which has 20MB of it.

But no, "You're depriving us of performance!!!!"

Announcement

Benchmarking The Linux Kernel With An "-O3" Optimized Build

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment