Announcement

**tildearrow** · 04 November 2022, 06:18 AM

Typo:

Originally posted by phoronix View Post

Generating an optimized binary does work for large applications with Faceobok/Meta

**quaz0r** · 05 November 2022, 03:33 AM

I wonder how long before people implement some kind of machine-learning-driven compilers? I could see that being a huge win potentially.

**NobodyXu** · 05 November 2022, 07:30 AM

Originally posted by quaz0r View Post

I wonder how long before people implement some kind of machine-learning-driven compilers? I could see that being a huge win potentially.

It would probably make sense to use ML to produce heuristics.
Modern optimization passes like inlining, constant folding, etc can be done by conventional coding very well, but heuristics for whether to inline a function is still quite hard to get right.
ML can very well help with that.

**dev_null** · 07 November 2022, 04:52 PM

Does anybody understand why BOLT works on ready binaries ... i.e. looks like the project is great, but to disassemble what was just assembled looks a little bit weird, i.e. according to the description it should be an alternate mode of PGO inside a compiler or as a compiler plug-in

**LuukD** · 08 November 2022, 09:52 AM

(Over-) simplified:

- LTO: "Jump around less!"
- PGO: "Hint the branch predictor of what is HOT or not." (*)
- BOLT: "Use caches better!" (*)

*: Needs runtime profile data

This, I think, enables LLVM itself to be built with BOLT for Linux on Arm, x86_64 and aarch64 such that the compiler itself gets to build other things a little bit faster.
In other words, developers are happy to see this.

**NobodyXu** · 08 November 2022, 10:13 AM

LuukD Just curious, can PGO actually deoptimize cold path and increase the inline threshold for hot path?

**LuukD** · 10 November 2022, 09:47 AM

Originally posted by NobodyXu View Post

LuukD Just curious, can PGO actually deoptimize cold path and increase the inline threshold for hot path?

I am by no means an authority on the subject, so if compiler people would like to chime in, please do.
AFAIK this is what PGO does:

Use profile information for register allocation to optimize the location of spill code.
Improve branch prediction for indirect function calls by identifying the most likely targets. (Some processors have longer pipelines, which improves branch prediction and translates into high performance gains.)
Detect and do not vectorize loops that execute only a small number of iterations, reducing the run time overhead that vectorization might otherwise add.

[cited from link below]

I have read elsewhere that LLVM likes to inline everything unless it has deduced / or is directed otherwise.
I think inlining is best done early on, because caller and callee become one, and this may allow code elimination. Then run instrumented to create a profile, and then decide upon vectorization, register allocation and branch hints. (PGO), then do LTO and then do BOLT.

That being said, maybe there are compilers which, indeed, utilize profile data to improve upon inlining but I would not know.

**NobodyXu** · 10 November 2022, 09:13 PM

LuukD Thanks!

Announcement

LLVM's BOLT Flipped On By Default For Linux x86/AArch64 Test Releases

LLVM's BOLT Flipped On By Default For Linux x86/AArch64 Test Releases

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment