Announcement

**mmstick** · 01 September 2017, 08:32 AM

Originally posted by pal666 View Post

note that this "compiler" can't produce executable. because it is just compiler's frontend. and you've been told this on multiple occasions, and yet you still are hallucinating

The compiler does it's job and produces a standardized IR format. There is no C++ involved.

you are moronic enough to give me link to c++ sources and claiming it is rust
https://github.com/rust-lang/llvm/tr...604b4e548a84d3

Why did you link the source code to LLVM? That isn't the Rust compiler.

that is not thread pool. you run arbitrary code on thread pool and if one task is waiting for other it will not deadlock only when other will have chance to run. which it won't have if your pool has one thread or in general if size of pool is smaller than length of dependency chain

Sorry, but you're incredibly wrong. That is a thread pool. It's not a single thread, but a pool of threads, and these pool of threads can execute events as they are received over time. No need to spawn threads every time a parallel calculation is needed, and no need to spawn as many threads as you have work units. No deadlocking of any form will happen.

most established c++ library is boost. good luck

Rust has no need for Boost. The things that Boost provides, Rust already has superior alternatives within it's standard library, the core language, and crates within the ecosystem. Sucks for C++ though that it's not Rust. C++ is now a dying breed that won't be around for the long term.

**caligula** · 01 September 2017, 10:15 AM

Originally posted by pal666 View Post

nobody needs simple c compiler. in real optimizing compiler parsing time is negligible

Depends on the compiler and language. I know some more recent compilers do around 10-20k lines per second. Parsing the Linux kernel with such speed takes quite long. Not all parsers (or generators) are that fast.

**caligula** · 01 September 2017, 10:39 AM

Originally posted by jrch2k8 View Post

Again you are pointing 1 set of processes of a bunch(you are not wrong until then) but a lot happens between reading the file(or TU if you prefer ) and the creation of that .o(or obj) file and a lot more happens long before the linker kicks in.

Indeed I simplified this quite a bit, but gcc is quite special compiler with lots of legacy code / design in it. There are tons of new compilers which actually have everything packed inside a single executable and process. They might even do linking in the same process.

The TU is the smallest atomic piece of code but not always can be parallelized nor always is unique nor always is serialized, it depends heavily on what all the processes in between need to do and with certain operations it may need deeper analysis and may even require to parse more deeply a huge amount of files.

I described some ways to parallelize and improve the speed at the level of TUs, without going into finer grained details. The main reason it works it we can start with full load at the beginning without slowly discovering new tasks as we compile. The autohell+gcc+make combo is suboptimal in so many ways. We don't even need to go into details about parallelization, we can just look at the problem and estimate how much faster will it get with a better design. Take a look at that zapcc. It can give a rough idea what's wrong. All the autohell caches, compiler caches etc. just memoize previous compiler/configuration runs. We could do a lot better by integrating those in the compiler. There would be a new problem, running out of memory, but it's not really a problem, just a matter of managing a cache which is a well known problem. Besides, e.g. C/C++ compilers started in environments that couldn't do full program compilation without 2+ passes due to memory constraints. Now my machine can hold 1000 compiler runs concurrently in memory. That's a 5 fold increase in memory size wrt computational resources. And this is just an ordinary desktop PC. Compilation takes place in cloud nowadays. The memory capacity won't become a problem since we can LRU evict data, we have swap files, and finally the amount of code isn't that large anyways. I can compile the whole Linux base system (few hundred megabytes) in RAM without discarding any intermediate data.

The problem is the TU is a very theoretical unit but in reality is not a good place to parallelize because modern compilers need several copies not only of each TU but the actual parsed files and depending the optimization levels even several IR interpretation of each TU and dependency tree on RAM and some passes can even generate more copies with mixed permutations to the discard and create another tree.

This poses some problems, but the critical optimizations that diversify the object code won't take place in the first phases (depending on the language). We could have a full IR representation before we start dropping subtrees and transforming the AST. Caching this helps already quite a bit.

My recommendation is try to track in code a real compiler from the time it opens an actual file and the moment the .o is created with several different flags and it will be more clear why is so complex and not as easy to purely parallelize as virtual theoretical wikipedia compilers concepts

The problem here is, you assume that there needs to be a certain link between on-disk formats and processing. I'm claiming that you need to rethink the whole design to speed up the whole task. One could really argue that compilation IS NOT a hard problem in need for parallelization even now. Take mainline Linux and gcc, make defconfig, it will take 20-30 seconds with workstation processors. Do some development, recompile, takes two seconds. We already have better implementations (in Java land the IDEs have incremental compilers), Zapcc and so on. We also have better parsing tech. A super optimized parser parses 1-10M lines per second, an ANTLR generated toy parser does 10k LOC per second. It would be nice to make it a lot faster, but is it really one of the really critical problems that need urgent fixing?

**jrch2k8** · 01 September 2017, 11:15 AM

Originally posted by caligula View Post

Indeed I simplified this quite a bit, but gcc is quite special compiler with lots of legacy code / design in it. There are tons of new compilers which actually have everything packed inside a single executable and process. They might even do linking in the same process.

I described some ways to parallelize and improve the speed at the level of TUs, without going into finer grained details. The main reason it works it we can start with full load at the beginning without slowly discovering new tasks as we compile. The autohell+gcc+make combo is suboptimal in so many ways. We don't even need to go into details about parallelization, we can just look at the problem and estimate how much faster will it get with a better design. Take a look at that zapcc. It can give a rough idea what's wrong. All the autohell caches, compiler caches etc. just memoize previous compiler/configuration runs. We could do a lot better by integrating those in the compiler. There would be a new problem, running out of memory, but it's not really a problem, just a matter of managing a cache which is a well known problem. Besides, e.g. C/C++ compilers started in environments that couldn't do full program compilation without 2+ passes due to memory constraints. Now my machine can hold 1000 compiler runs concurrently in memory. That's a 5 fold increase in memory size wrt computational resources. And this is just an ordinary desktop PC. Compilation takes place in cloud nowadays. The memory capacity won't become a problem since we can LRU evict data, we have swap files, and finally the amount of code isn't that large anyways. I can compile the whole Linux base system (few hundred megabytes) in RAM without discarding any intermediate data.

This poses some problems, but the critical optimizations that diversify the object code won't take place in the first phases (depending on the language). We could have a full IR representation before we start dropping subtrees and transforming the AST. Caching this helps already quite a bit.

The problem here is, you assume that there needs to be a certain link between on-disk formats and processing. I'm claiming that you need to rethink the whole design to speed up the whole task. One could really argue that compilation IS NOT a hard problem in need for parallelization even now. Take mainline Linux and gcc, make defconfig, it will take 20-30 seconds with workstation processors. Do some development, recompile, takes two seconds. We already have better implementations (in Java land the IDEs have incremental compilers), Zapcc and so on. We also have better parsing tech. A super optimized parser parses 1-10M lines per second, an ANTLR generated toy parser does 10k LOC per second. It would be nice to make it a lot faster, but is it really one of the really critical problems that need urgent fixing?

Ok, now that we have a better understanding of each pov I can agree with:

1.) yea GCC have a lot of legacy code but the behavior I talked about also happens on Clang and ICC but I agree with you in the sense this is with C/C++ and probably lot of legacy code make its way into those as well

2.) I agree certain steps(depending on the language) can certainly be made to improve further compiling performance and prolly some new techniques including those you mentioned could be used to even reduce the footprint and synchronization problems between passes maybe even make those atomic.

3.) pretty much on the same page

4.) I agree compilation is not really a problem right now and dare to assert is fast enough until bigger issues are improved that justifies invest time into a compiler speed, so yeah same page.

The only caveat I have is I don't think C/C++ specific compilation can be improved much further because some optimizations are truly hellish to implement and as far as i know no one have found alternative ways to implement them even on theory and those are a big part of the reason no other language can come close to C/C++ performance in many scenarios but in other languages certainly you can get fast enough while keeping the compiler sane and fast like rust does with LLVM.

Disclaimer: I know rust can be faster at runtime than readable partially optimized C++ but is no match to properly optimized tho almost unreadable C++ in many HPC cases and I'm not saying rust will not eventually get there since LLVM still miss optimizations that could help rust in the future as well tho I'm sure compilation speed will start suffer as it does in C++, so It will depend if the rust guys find that trade off acceptable

**mmstick** · 02 September 2017, 05:25 PM

Originally posted by jrch2k8 View Post

Disclaimer: I know rust can be faster at runtime than readable partially optimized C++ but is no match to properly optimized tho almost unreadable C++ in many HPC cases and I'm not saying rust will not eventually get there since LLVM still miss optimizations that could help rust in the future as well tho I'm sure compilation speed will start suffer as it does in C++, so It will depend if the rust guys find that trade off acceptable

I could say the same about less-readable, non-idiomatic and unsafe Rust, too. However, I think the key here is that software written in Rust consistently outperforms C++ software across the board. Hence, Rust software is regularly being released that defeats existing C / C++ tools that have had decades of a head start. Compiler enforces software to be written in a more efficient manner due to compiler rules (borrow checker), and this also opens the door to encouraging more brute-force optimizations in more cases (due to borrow checker, safety guarantees, and Cargo).

But any issues with performance are entirely related to LLVM in Rust's case. There are plans to produce an IR for GCC, in addition to LLVM, at some point, which would lead to a more fair comparison between Rust and C / C++ software (mainly because most benchmarks compare GCC-compiled C / C++ to LLVM-compiled Rust).

**pal666** · 02 September 2017, 07:57 PM