Originally posted by jrch2k8
View Post
Announcement
Collapse
No announcement yet.
LLVM Clang 12 Leading Over GCC 11 Compiler Performance On Intel Xeon Scalable Ice Lake
Collapse
X
-
- Likes 1
-
Originally posted by ezekrb5 View PostI love competition on this space. Glad to see we have multiple options. I'm still on GCC and won't probably move away from it anytime soon. But good for the clang guys!
https://git.kernel.org/pub/scm/linux...inux-misc.git/
I have been using CLANG for a while now on Arm myself, but I still prefer GCC as a stable workhorse, because many projects compile out-of-the-box with GCC while with CLANG is this not always the case (but it has gotten easier).Last edited by sdack; 04 June 2021, 02:15 PM.
- Likes 3
Comment
-
Originally posted by discordian View PostIs -O3 actually faster than -O2? Everytime I test this it's even or slower. (But I actually test this on power efficient cores like arm and Apollo Lake)
Back in 2017 when I was working on this library project written in C++ we built it with GCC for several different versions of Linux. On my Fedora development machine O3 was 5% faster than O2, and using Profile Guided Optimization on top of that made O3 15% faster than straight O2. We didn't use LTO, instead we used an older technique that combined C++ source files together into bigger units.
I find that O3 works especially well in combination with LTO and PGO.
It does tend to make the code size larger which can be a problem for mobile CPU with small cache. We were targeting Xeon. Your mileage may vary.
- Likes 4
Comment
-
Originally posted by sdack View PostThere is no single optimal approach when you have countless of x86-compatible CPUs. You indeed have to make a compromise and choose the "Good Enough" approach as you have put it. Or you would need to produce a binary for every CPU and memory type out there, because even the main memory speed and timings impact your code's performance and this just is not known by compilers during the optimisation.
- Likes 1
Comment
-
Originally posted by jrch2k8 View PostYeap you are right but there are things you can do that are generic enough to let -O3 optimize the hell out of it but still run on most CPU(not as efficient as going full per CPU manual optimization as you correctly imply) like for example find ways to never branch inside a loop(with enough iterations) or if you do test it can be splitted, verify your types are aligned, never allocate inside a loop if it is not absolutely needed, be smart about thread life times, use templates when they are absolutely needed(this is a morbo most noob C++ devs have to prove their code is super C++sy and in most cases hurt performance and the compiler like hell), STL implementations are generic not performant but most noob C++ devs think is both (spoiler alert is 95% of the cases is not, use custom allocators if you need performance ), etc. etc.
- Likes 1
Comment
-
Originally posted by discordian View PostIs -O3 actually faster than -O2? Everytime I test this it's even or slower. (But I actually test this on power efficient cores like arm and Apollo Lake)
Having said that, again, just straight up -O3 will most likely beat -O2 in the vast majority of cases, Phoronix recently compared -O2 against -O3 on GCC 11 and Clang/LLVM 12 here: https://www.phoronix.com/scan.php?pa...12-gcc11&num=1
With the overall results showing a clear gain for -O3 over -O2 both for GCC and Clang/LLVM on both AMD and Intel hardware, admittedly this test also added -march=native for the -O3 comparison, which will skew the result to some degree.
Comment
-
Originally posted by Steffo View PostHoly sh*t! I saw it coming: In the long term, Clang/LLVM wins over GCC.
- Likes 1
Comment
-
Really interesting evaluation, thanks. This assumes, however, that the developer doesn't change the default flags for his/her specific application.
For performance-sensitive applications, developers usually spend some time testing different flags. So another interesting question would be: what is the best performance that can be achieved AFTER performing compiler flag mining.
Comment
Comment