Originally posted by Jannik2099
View Post
Announcement
Collapse
No announcement yet.
Experimental -O3 Optimizing The Linux Kernel For Better Performance Brought Up Again
Collapse
X
-
Originally posted by birdie View PostOn a normal system (with a decent amount of RAM and powerful CPU cores) the kernel itself shouldn't take too much CPU time, so the difference between -O2 and -O3 even if the latter is twice as fast should still be minimal. -O3 might be beneficial for hosting/traffic/VPN providers and people with very weak PCs and that's it.
And some -O3 options, if used without moderation, are outright harmful, since they bloat up the code and that leads to L1/L2 caches being eviscerated.
Comment
-
Originally posted by birdie View Post
GCC 12.1.
O2 vs Ofast:Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fmath-errno [disabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftrapping-math [disabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funsafe-math-optimizations [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fmath-errno [disabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -ftrapping-math [disabled] + -funsafe-math-optimizations [enabled]
- Likes 2
Comment
-
Originally posted by Jannik2099 View Post
Unless they patched the kernel elsewhere, no they do NOT. The kernel Makefile would override any previous -O flag.
-Ofast is also more or less meaningless in a kernel context as it mainly deals with floating point optimizations.
Comment
-
Originally posted by Jannik2099 View PostRight, and as you can see the only differences between O3 and Ofast that are not floating-point related are -fallow-store-data-races and -fno-semantic-interposition - the former is bordering on suicide
- Likes 4
Comment
-
Originally posted by Jannik2099 View Post
Unless they patched the kernel elsewhere, no they do NOT. The kernel Makefile would override any previous -O flag.
-Ofast is also more or less meaningless in a kernel context as it mainly deals with floating point optimizations.
Originally posted by birdie View Post
GCC 12.1.
O2 vs Ofast:Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fmath-errno [disabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftrapping-math [disabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funsafe-math-optimizations [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fmath-errno [disabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -ftrapping-math [disabled] + -funsafe-math-optimizations [enabled]
Few things in -0fast are problem. There is some floating point code inside the Linux kernel.
https://elixir.bootlin.com/linux/lat...rnel_fpu_begin.
This is not in many files. But turning of maths safeties and -Ofast does. " -funsafe-math-optimizations" is a very big catch all.
Big problem with fast is
"-fallow-store-data-races"
Allow the compiler to perform optimizations that may introduce new data races on stores, without proving that the variable cannot be concurrently accessed by other threads. Does not affect optimization of local data. It is safe to use this option if it is known that global data will not be accessed by multiple threads.
Examples of optimizations enabled by -fallow-store-data-races include hoisting or if-conversions that may cause a value that was already in memory to be re-written with that same value. Such re-writing is safe in a single threaded context but may be unsafe in a multi-threaded context. Note that on some processors, if-conversions may be required in order to enable vectorization.
There are a few problem child inside Linux kernel space with -03. "-fpredictive-commoning"
Perform predictive commoning optimization, i.e., reusing computations (especially memory loads and stores) performed in previous iterations of loops.
What is safe to do to user space code that only has a single set of page tables accessible is not always safe when you do it from a kernel.
- Likes 3
Comment
-
Originally posted by CochainComplex View PostIndeed if O3 causes issues, 99% its related to bad code.
Lot of ways there should be -O3 for kernel code. Switching between user-space and kernel space page tables and interfacing with with memory across multi groups of page table this is behavour of protected mode OS kernels and fairly much nothing else.
Yes 1% good code with -O3 causing issues turns out be 99% focused to OS kernel building. Horrible part these are race conditions as well so you can have a lot of works for me when the build is in fact completely broken just you don't know yet.
-O2 for the Linux kernel is the safe choice. -O3 for Linux kernel is minor-ally unsafe the build might be totally fine but it could also be broken and -Ofast with Linux kernel is danger diving because something will be wrong somewhere.
Most of options current enabled for -03 that should be harmless to build the Linux kernel with. Problem here its not all options in -O3 are safe.
- Likes 1
Comment
-
Originally posted by birdie View PostIn my 25+ years of using PC, laptops, etc. I've had 0 situations where ntoskrnl.exe or vmlinuz took a discernible amount of CPU time.
If we take e.g a simple ls in a large directory:
Code:time ls /opt/largedir/ > /dev/null real 0m0.055s user 0m0.040s sys 0m0.016s
- Likes 10
Comment
-
Originally posted by F.Ultra View Post
That is not how things work, the kernel is not some random exe that executes parallel to userland (there are kernel threads for e.g filesystems that do yes, but not vmlinuz as such) and somehow I think that you already know this?!
If we take e.g a simple ls in a large directory:
Code:time ls /opt/largedir/ > /dev/null real 0m0.055s user 0m0.040s sys 0m0.016s
Secondly, I bet in your example the kernel spent ~95% of time waiting for IO and ~5% getting you the data. Again, let's make the kernel twice as fast and as a result you'll shave off 0.0002ms? Woah.
You seemingly don't understand how the kernel works either. It's a proxy, it must be a proxy, if it does any serious work, it's badly coded. I can only imagine e.g. some software encryption/decryption algos taking a lot of CPU time which I mentioned earlier for VPN providers.
- Likes 2
Comment
Comment