Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By Clang

Hugh replied

12 July 2020, 09:18 PM
Originally posted by Zan Lynx View Post

You missed everything under "indirect function call checking."

The Linux kernel uses a lot of indirect function calls. That's every call through a function pointer. The entire kernel module system relies on indirect function calls.

Plus, I believe that if you compile for the right CPU types clang and GCC will use hardware support for CFI. That means that if any code jumps to a function, it has to arrive at a special marker instruction or abort the program. That makes ROP abuse hard because malware can no longer jump into the middle of a useful function.

https://www.redhat.com/en/blog/fight...rity-cfi-clang
https://software.intel.com/sites/def...gy-preview.pdf

Ahh. Thanks.

My code is almost always type-checked and type-checkable statically, including and especially function pointers. The very few exceptions are checked (by hand) very carefully. I don't know about kernel code.

Although void * can point at any object in C, a function isn't an object. The C standard doesn't really provide an equivalent for pointers to functions. Unless it was added since I last looked carefully.

Putting a marker instruction on each function's object code certainly should not require anything like LTO. Why would it not be a completely separate flag?

Double ahh: after posting I saw the URLs. So this is catching function pointers overwritten at run-time. Not bad type-punning.

Last edited by Hugh; 12 July 2020, 09:23 PM.
Leave a comment:
Zan Lynx replied

12 July 2020, 06:45 PM
Originally posted by Hugh View Post

Why is that useful for the kernel?
CFI seems to find C++ bugs https://clang.llvm.org/docs/ControlFlowIntegrity.html
None seem to apply to C code.
The kernel has no C++ code.

You missed everything under "indirect function call checking."

The Linux kernel uses a lot of indirect function calls. That's every call through a function pointer. The entire kernel module system relies on indirect function calls.

Plus, I believe that if you compile for the right CPU types clang and GCC will use hardware support for CFI. That means that if any code jumps to a function, it has to arrive at a special marker instruction or abort the program. That makes ROP abuse hard because malware can no longer jump into the middle of a useful function.

Fighting exploits with Control-Flow Integrity (CFI) in Clang

https://www.redhat.com/en/blog/fighting-exploits-control-flow-integrity-cfi-clang

In this post, we look at Control-Flow Integrity (CFI) protection implemented by the Clang compiler for x86_64 architecture.

Access Denied

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Leave a comment:
Hugh replied

12 July 2020, 05:54 PM
In addition to the performance focus of LTO, the other motive for Google LTO'ing the kernel is for enabling Clang Control-Flow Integrity (CFI) in conjunction with LTO.

Why is that useful for the kernel?
CFI seems to find C++ bugs https://clang.llvm.org/docs/ControlFlowIntegrity.html
None seem to apply to C code.
The kernel has no C++ code.
Leave a comment:
Zan Lynx replied

27 June 2020, 04:38 PM
Originally posted by Grinch View Post

The performance benefits of PGO widely eclipses those of LTO in my experience. I agree that for most applications it doesn't matter, but the same is true for LTO. However, for cpu intensive stuff, PGO give me ~5-20% performance increase.

Agreed. I worked on a software project a few years ago which was a library. Doing PGO on it was worth about 15% improvement. It already had a pretty good test suite: about 75% branch coverage (that is actually really good in my experience. Just imagine needing to fake out every disk operation and "malloc()" for failure testing: We didn't.) and the profile step just had to run the tests with the right amount of weight given to common operations. Such as doing successful operations instead of error path tests.
Leave a comment:
JustinTurdeau replied

27 June 2020, 03:52 PM
Originally posted by goTouch View Post

Today's software are bloated so much, I have no idea why so big chunk of code is needed to let me read the text on my screen.

Yeah, a lot of "modern" software is bloated. What a highly insightful and non-obvious comment.

Originally posted by goTouch View Post

LTO PGO stuff is just for lazy developers

You clearly have no idea what you're talking about. Opinion disregarded.
Leave a comment:
goTouch replied

27 June 2020, 03:05 PM
LTO PGO stuff is just for lazy developers. The most effective improvement for perf is: don't load the code that I will not use.
Today's software are bloated so much, I have no idea why so big chunk of code is needed to let me read the text on my screen.
Browser is the most bloated software ever. Linux kernel is much less bloated (or not bloated at all).
If a browser still has a text-only mode I'd be very happy to use it.
Go hxxx java and javascript.
Leave a comment:
CochainComplex replied

26 June 2020, 03:19 PM
Originally posted by Grinch View Post

This statement makes no sense, having a code path being executed is how you get profile data. Representative profile data is something you get when running the application per your typical workload.

The alternative to this is guesswork from the compiler, which is what you get without PGO, unless you use a lot of compiler extensions which allow you to give 'hints' to said compiler. Linux does this to a large extent, but the vast majority of software, including very performance critical, does not.

A real world example, for Blender CPU rendering I've gotten up to 22% performance increase by recompiling with PGO, that is a massive performance boost.

true but he exactly meant this. Typical workload - it means that you will not just have to start and quit it immediatly. And sometimes this profiling mode is really slowing down everything. Real productive work at this moment is not always possible.
Likes 1
Leave a comment:
CochainComplex replied

26 June 2020, 03:15 PM
Originally posted by JustinTurdeau View Post

Well you should know better than anyone else if PGO is applicable or not. I said PGO was a toy for "most applications", not "all applications". The reason I said this is because I see a lot of people droning on about it when they clearly have no intention of doing the work required for it to be a worthwhile improvement.

I've seen loads of people on these forums saying "do some PGO benchmarks", but doing that properly is way more work than Michael typically puts into his benchmarks.

absolutely true. I think the most people demanding it - have never done it. Applied it multiple times but it is way more than just compiling it twice.
Leave a comment:
CochainComplex replied

26 June 2020, 03:13 PM
Originally posted by Hans Bull View Post

Reducing kernel obesity would be more than welcome.

Difficult with homeoffice ...the fridge too close to the workplace. No walks outside. soon it will have 5.9 just rising ...
Leave a comment:
oleid replied

26 June 2020, 05:07 AM
Originally posted by JustinTurdeau View Post

I've seen loads of people on these forums saying "do some PGO benchmarks", but doing that properly is way more work than Michael typically puts into his benchmarks.

Indeed. While those benchmarks are highly repeatable and I'd expect the profile gathered would probably optimize that particular benchmark nicely, I don't think we could draw any conclusions except that PGO would yield an improvement for exactly that benchmark with the given parameters.
Leave a comment:

Announcement

Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By Clang

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: