Having no experience in customizing a kernel build, this might seem like a dumb question but: is there a way to use -O3 for specific files, rather than the whole thing? Seems to me in most cases, -O3 is not worth the risk, but there are some situations where there is a significant improvement.
Benchmarking The Linux Kernel With An "-O3" Optimized Build
Collapse
X
-
Originally posted by birdie View Post
Just what I expected. A well written kernel must have next to zero impact on performance other than using the CPU intensive features the kernel itself provides, i.e. encryption, connections, context-switching, etc. which is not absolute most users ever deal with.
Comment
-
-
Originally posted by milkylainen View PostI think the results should be interpreted otherwise.
It's not strange that you don't see any benefits from something that spends < 1% cpu time in kernelspace.
If you're measuring an entire system then yeah, perhaps not that much difference.
But as the synthetic tests measuring syscalls or actual kernel operations (like context switching)...
Then wow did that O3 flag mean improvement!
In summary: It's unfair to say that it doesn't help the kernel when measuring an entire system.
This is one reason we could use a comprehensive test suite for the Linux kernel that runs through known exploit and bug conditions.
Comment
-
-
Originally posted by schmidtbag View PostHaving no experience in customizing a kernel build, this might seem like a dumb question but: is there a way to use -O3 for specific files, rather than the whole thing? Seems to me in most cases, -O3 is not worth the risk, but there are some situations where there is a significant improvement.
I also agree with you that -03 might not be worth the risk. Apparently nobody arguing here have looked up what the difference between -O2 and O3 is.
Well it is all described here : https://gcc.gnu.org/onlinedocs/gcc/O...timize-Options
And O3 equals all the switches enabled by O2 (and therefore O1 as well) plus the following
-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-fsplit-paths
-ftree-loop-distribution
-ftree-partial-pre
-funswitch-loops
-fvect-cost-model=dynamic
-fversion-loops-for-strides
I am no expert, but I find the -ftree-loop-distribution the be worrying as this may be an excellent catalyst for triggering race conditions that that kind of stuff. Sure , if it is a bug - it is good to find it , but race conditions are usually not easy to track down.
Also the -fipa-cp-clone can (according to the manual) significantly increase code size which , depending on the code being run - can cause things to no longer fit in the lower level CPU caches, but what freaks me out the most is the -fpredictive-commoning flag which sounds like a red flag especially if some tired coder forgot to declare a variable volatile while he was working on a driver.
Sure all of these things are bugs that should be caught and squashed at some point , but is it really worth the risk for potentially slightly more performance? Most of what the CPU should do should happen in userspace anyway and the kernel should try it's best to get the hell out of the way as much as possible.
For me it all sounds like trying to optimize your bathroom routine by focusing all your energy on how to speed up the time needed to unscrew the cap of your toothpaste.
http://www.dirtcellar.net
Comment
-
-
Originally posted by waxhead View PostSounds like trying to optimize your bathroom routine by focusing all your energy on how to speed up the time needed to unscrew the cap of your toothpaste.
Comment
-
-
Originally posted by binarybanana View Post
Disagree. A lot of users use zram/zswap and/or LUKS, maybe something like Wireguard, so any gains in that area are a big win. Sadly, none of those were tested here, but my guess is O3 might help for those workloads. In that case it would make sense to optimize those parts more aggressively than the rest. The only reason it can't be easily done is that the bernel build system doesn't support it. It's all or nothing.
Now on to your arguments:- If your kernel spends too much time swapping out/in using ZSWAP or you're a fan or ZRAM, you're lacking RAM anyways and your performance is hugely compromised regardless (of kernel compilation flags)
- LUKS is irrelevant for absolute most users out there since it uses AES which has been HW accelerated in most CPU released over the past seven years. If your CPU doesn't HW accelerate AES instructions, you're fucked regardless - I've worked with LUKS on such PCs - it's a torture. OK, with -O3 you'll get something like 20MB/sec throughput, with -O2 you'll get 15MB/sec - both are terribly slow.
- Wireguard - again, AES. Shouldn't register in top unless you send more than tens of megabytes of traffic per second which is valid for whom exactly? And if AES is HW accelerated -O3 or -O2 will mean nil.
If you feel so confident, please, spend half an hour and show your results, OK? PLEASE.
Comment
-
Originally posted by ms178 View PostI hope this set of benchmarks convince Linus that the effort is worth it. There were clearly benefits for some workloads and no showstoppers. Weeding out the compiler/kernel bugs for O3 is a worthy effort in my eyes, even more so now than before.
Comment
-
-
Originally posted by waxhead View Post
Not sure how the kernel build system works , but in theory it should be possible to pre-compile bits and pieces.
But no, "You're depriving us of performance!!!!"
Comment
-
Comment