Announcement

**c117152** · 26 November 2018, 07:39 AM

Originally posted by dstaubsauger View Post

Disassemling and recompiling machine code based on a heuristic on what the control flow is supposed to be? Isn't that wildly unsafe?

Nah it's simpler than you think. e.g. Imagine a 1mb stack of functions being loaded into a 500kb cache. The compiler didn't have any way to tell which function would be routinely used and which would be only used rarely or maybe once or twice so the runtime just shoves the first half in and starts execution. When a function is missing from the the cache it drops the top function off the cache (or more depending on size) and shoves that newly called one in. If you're unlukcy, you might even thrash constantly going back and forth from the RAM to the cache for just one or two functions... Anyhow, what facebook did was to write a little tool that reviews pref logs that monitored the above behavior, and reshuffles the stack order by moving routines around so they load at a different order and modify the go-tos throughout the program to address them appropriately. So, the most used functions throughout the lifetime of a program get loaded first while the least used ones are loaded last. This way thrashing is kept to a minimum.

This is a standard optimization strategy that been used by virtual machines for decades. It's basically an automated version of packing functions and data into structs to force them into the same memory region. On a much smaller scale, out-of-order machines do this all the time.

**cb88** · 26 November 2018, 03:29 PM

Originally posted by ssokolow View Post

Interesting. I wonder how strongly it's biased toward understanding the code generation quirks of C and C++. (ie. How suitable it is for being applied to other languages which compile to machine code using GCC- or LLVM-based compilers.)

No it isn't so much recompiling a shuffling around the existing compiled code into a more optimal format so it can state cache resident more often.

LLVM can in theory do something like what you ask but the bits necessary to do that aren't really implemented.... to compile from architecture's binary to another binary is a bit more complex than you'd think. Even if you have IR compiled for one arch, converting that to IR for another arch is still non trivial.

As this details even when the IR itself is crossplatform the IR code generated may not be... because the program is different at compile time due to preprocessor directives etc..

LLVM bitcode cross-platform

https://stackoverflow.com/questions/14258194/llvm-bitcode-cross-platform

Just to be sure: Is LLVM bitcode cross-platform? By which I mean, can the generated IR (".bc") file be distrubuted and interpreted/JITed over various platforms? If so, how does Clang convert C++ i...

**ssokolow** · 27 November 2018, 09:02 AM

Originally posted by cb88 View Post

No it isn't so much recompiling a shuffling around the existing compiled code into a more optimal format so it can state cache resident more often.

LLVM can in theory do something like what you ask but the bits necessary to do that aren't really implemented.... to compile from architecture's binary to another binary is a bit more complex than you'd think. Even if you have IR compiled for one arch, converting that to IR for another arch is still non trivial.

As this details even when the IR itself is crossplatform the IR code generated may not be... because the program is different at compile time due to preprocessor directives etc..

LLVM bitcode cross-platform

https://stackoverflow.com/questions/14258194/llvm-bitcode-cross-platform

Just to be sure: Is LLVM bitcode cross-platform? By which I mean, can the generated IR (".bc") file be distrubuted and interpreted/JITed over various platforms? If so, how does Clang convert C++ i...

I have no idea what you're talking about. When did I ever talk about LLVM IR, alternative arches, or anything like that.

My question was basically "If they developed this to do PGO-esque reoptimization of GCC- and LLVM-compiled C and C++, and it works on compiled artifacts... how likely is it to Just Work™ if fed a GCC-compiled D binary or an LLVM-compiled Rust binary instead?"

**cb88** · 27 November 2018, 10:45 AM

Originally posted by ssokolow View Post

I have no idea what you're talking about. When did I ever talk about LLVM IR, alternative arches, or anything like that.

My question was basically "If they developed this to do PGO-esque reoptimization of GCC- and LLVM-compiled C and C++, and it works on compiled artifacts... how likely is it to Just Work™ if fed a GCC-compiled D binary or an LLVM-compiled Rust binary instead?"

I was replying to comment #3 no idea why it quoted you.

Announcement

Facebook's BOLT Is An Effort To Speed-Up Linux Binaries

Comment

Comment

Comment

Comment