Poor Volta continues to insult me in every thread. As unhappy as I am I don't have the desire to do that to people on the net. Would be amazing if he at least added counter arguments, but, no, "You know shit" that's all he's capable of saying. And he's all alone in that - not a single other person in this discussion used the same verbiage. Amazing "intellect", what can I say.
Announcement
Collapse
No announcement yet.
Experimental -O3 Optimizing The Linux Kernel For Better Performance Brought Up Again
Collapse
X
-
Originally posted by oiaohm View PostBig problem with fast is
"-fallow-store-data-races"
Linux kernel is multi threaded in lots of places. So yes this one feature of -Ofast can make a lot of race condition locations in the Linux kernel.
There are a few problem child inside Linux kernel space with -03. "-fpredictive-commoning"
Ok what if you have just performed a command that altered memory mapping from userspace to kernel space or the reverse. Some cases you are need to exactly replay the load and stores so that contents of the page tables exposed to user-space and the contents of page tables exposed to applications are the same.
What is safe to do to user space code that only has a single set of page tables accessible is not always safe when you do it from a kernel.
Any and all such "commands" in kernel code would be wrapped in accessors and barriers that prohibit any and all reordering, anyway. Same goes for any accesses to memory shared between multiple threads and/or CPUs.
- Likes 1
Comment
-
Originally posted by birdie View PostPoor Volta continues to insult me in every thread. As unhappy as I am I don't have the desire to do that to people on the net. Would be amazing if he at least added counter arguments, but, no, "You know shit" that's all he's capable of saying. And he's all alone in that - not a single other person in this discussion used the same verbiage. Amazing "intellect", what can I say.
- Likes 6
Comment
-
Originally posted by intelfx View Post
You don't present any arguments, so there can be no counterarguments. You just stand there frothing at the mouth calling everyone names, so don't be surprised when people respond in the same way.
Comment
-
Originally posted by birdie View Post
GCC 12.1.
O2 vs Ofast:Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fmath-errno [disabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftrapping-math [disabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funsafe-math-optimizations [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fgcse-after-reload [enabled] + -fipa-cp-clone [enabled] + -floop-interchange [enabled] + -floop-unroll-and-jam [enabled] + -fpeel-loops [enabled] + -fpredictive-commoning [enabled] + -fsplit-loops [enabled] + -fsplit-paths [enabled] + -ftree-loop-distribution [enabled] + -ftree-partial-pre [enabled] + -funroll-completely-grow-size [enabled] + -funswitch-loops [enabled] + -fversion-loops-for-strides [enabled]
Code:+ -fallow-store-data-races [enabled] + -fassociative-math [enabled] + -fcx-limited-range [enabled] + -ffinite-math-only [enabled] + -fmath-errno [disabled] + -freciprocal-math [enabled] + -fsemantic-interposition [disabled] + -fsigned-zeros [disabled] + -ftrapping-math [disabled] + -funsafe-math-optimizations [enabled]
- Likes 2
Comment
-
Originally posted by Jannik2099 View Post
No, the 1% is absolutely not OS code, and certainly not something with as much code rot as the linux kernel.
Paging has absolutely nothing to do with this. It's neither required nor specially treated in the C standard, nor is there a magic -funsafe-paging flag in -O3 - it has absolutely nothing to do with the kind of UB that gets exposed by O3
Again, if O3 breaks something that means YOU have a bug. And that bug WILL most likely manifest at O2 too at some point in time. Ignoring the issue because you don't want to fix it does not help.
And again, no, all options in O3 ARE safe. All options in O3 are fully standards compliant in gcc and clang.Last edited by carewolf; 23 June 2022, 03:29 PM.
- Likes 3
Comment
-
-
Originally posted by birdie View PostIn my 25+ years of using PC, laptops, etc. I've had 0 situations where ntoskrnl.exe or vmlinuz took a discernible amount of CPU time.
I've only done much profiling of server applications, but it's not unusual for me to see > 10% in the kernel. And that's just within the process I'm examining. I don't usually do system-wide profiling, so I can't say how much kernel time is unassociated with the process, but I've often seen overall sys time well above 10%, in top.Last edited by coder; 23 June 2022, 05:04 PM.
- Likes 4
Comment
-
Originally posted by birdie View PostSecondly, I bet in your example the kernel spent ~95% of time waiting for IO and ~5% getting you the data. Again, let's make the kernel twice as fast and as a result you'll shave off 0.0002ms? Woah.
Originally posted by birdie View PostYou seemingly don't understand how the kernel works either. It's a proxy, it must be a proxy, if it does any serious work, it's badly coded.
It really depends on how heavily you're leaning on them. Something like a database can bypass a little bit of that by doing direct I/O (which it shouldn't have to, if we had an ideal kernel), but the more of those areas you're touching, the more you're really dependent kernel performance.
- Likes 6
Comment
-
Originally posted by DanielG View Post(Should probably be done with an AMD GPU, maybe their millions of lines of driver-code can benefit from more optimization?)
- Likes 3
Comment
Comment