Originally posted by oleid
View Post
Announcement
Collapse
No announcement yet.
Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By Clang
Collapse
X
-
Originally posted by JustinTurdeau View Post
That's horseshit. LTO can massively improve execution speed.
Edit:
What usually helps is profile guided optimization. Especially when combined with LTO.Last edited by oleid; 25 June 2020, 03:43 PM.
- Likes 1
Comment
-
Originally posted by oleid View PostExecution speed is usually not really improving with lto, but size reduction is often measurable.
Comment
-
Originally posted by oleid View PostSure it CAN, but usually it doesn't. At least not on its own.
Originally posted by oleid View PostWhat usually helps is profile guided optimization. Especially when combined with LTO.
Manual uses of __builtin_expect() and __attribute__((cold)) can also give some of the same benefits without having the constant, uphill battle to get good profile coverage.Last edited by JustinTurdeau; 25 June 2020, 09:13 PM.
- Likes 1
Comment
-
Originally posted by JustinTurdeau View PostPGO is a toy for most applications. Getting high quality profile data is usually more effort than it's worth. LTO is a much easier value proposition for most developers.
Also, 'data processing pipeline' kinded programs. Where getting profiles can be easily automated. And all parts of the program can be touched without much effort.
Comment
-
Originally posted by oleid View Post...all parts of the program can be touched without much effort.
Becoming obsessed with just "touching every line" is a masturbatory pursuit. The same point applies to test coverage too. All other things being equal, having more coverage is better than less coverage, but it still doesn't make it complete or high quality and it still doesn't account for every possible state the program can be in before a given path is taken.
Comment
-
Originally posted by JustinTurdeau View PostPGO is a toy for most applications. Getting high quality profile data is usually more effort than it's worth.
As for getting 'high quality' profile data, just run the application according to your typical usage, it's not rocket science. Better yet, having applications automating this by incorporating PGO support is the ideal solution, Firefox and x264 does this.
Originally posted by JustinTurdeau View PostLTO is a much easier value proposition for most developers.
Comment
-
Originally posted by JustinTurdeau View Post
That doesn't mean anything. Merely "touching" a particular code path isn't the same thing as getting high quality, representative profile data.
It would not work if not all sensor input was available. That is what I mean with touching all code paths.
Comment
-
Originally posted by JustinTurdeau View PostThat doesn't mean anything. Merely "touching" a particular code path isn't the same thing as getting high quality, representative profile data.
The alternative to this is guesswork from the compiler, which is what you get without PGO, unless you use a lot of compiler extensions which allow you to give 'hints' to said compiler. Linux does this to a large extent, but the vast majority of software, including very performance critical, does not.
A real world example, for Blender CPU rendering I've gotten up to 22% performance increase by recompiling with PGO, that is a massive performance boost.
Comment
-
Originally posted by oleid View PostSo how do you get that? In my experience, running the software is enough.
Firefox is an example of a program that would be pretty hard to generate a good profile for. Something like a parser library, on the other hand, would be an ideal use case for PGO because you can just feed it a big corpus of typical input data.
Originally posted by oleid View PostIt gets its input from the sensors
Originally posted by oleid View PostIt would not work if not all sensor input was available. That is what I mean with touching all code paths.Last edited by JustinTurdeau; 26 June 2020, 03:05 AM.
- Likes 1
Comment
Comment