Announcement

**pal666** · 13 January 2019, 07:03 AM

Originally posted by CochainComplex View Post

Have a look at CLear Linux and see whats possible.

when i've looked at clear linux i've seen losers trying to cheat again. like producing avx2 libs with -march=haswell instead of -mavx2

**pal666** · 13 January 2019, 07:09 AM

Originally posted by birdie View Post

What are your examples that everything is faster when compiled with -O3?

What are your examples that everything is faster when compiled with -O2?

Originally posted by birdie View Post

Michael didn't mention it however the binaries for applications compiled with -O3 might become up to 50% fatter which actually makes your application loading times longer but who cares on this forum?

probably noone on this forum is loading applications from fdd

Originally posted by birdie View Post

In fact you haven't shown a single test where -O3 is faster, have you?

in fact you are commenting on article which you didn't bother to read

**pal666** · 13 January 2019, 07:12 AM

Originally posted by F.Ultra View Post

If that happens (and of course it will happen in a few cases) than it does not mean that "the code was bloated", all it means is that there is a bug in the GCC optimizer for that particular code because the resulting code was _less_ optimized in that case.

usually code was more optimized but run less. i.e. it had large cache footprint for one run for example. maybe optimizer could do better job, but i wouldn't call it bug

**pal666** · 13 January 2019, 07:17 AM

Originally posted by birdie View Post

-O3 is also known to break several packages.

there are packages which are broken by strict aliasing which is enabled with -O2 and -Os. fix your packages, don't blame compiler

**pal666** · 13 January 2019, 07:35 AM

Originally posted by birdie View Post

There is literally a hundred bug reports opened against GCC where it either compiles sub-optimal code at -O3 or crashes.

there are bug reports of gcc crashing or generating sub-optimal code regardless of optimization level. so your suggestion is to avoid compilation? or you are just habitually posting irrelevant bullshit?

**pal666** · 13 January 2019, 07:46 AM

Originally posted by quaz0r View Post

do those vectorize options that michael used imply march=native ? otherwise how are they generating vector instructions...are they maybe defaulting to some 20 year old sse instructions or something ?

they generate any instruction(not only vector ones) based on current march setting. which by default for x86_64 will have some sse from first opterons

**pal666** · 13 January 2019, 08:57 AM

Originally posted by hreindl View Post

https://clearlinux.org/blogs/transpa...l-architecture

nonsense
https://blog.linuxplumbersconf.org/2...iginal/GCC.pdf
pages 16 and 17

Originally posted by hreindl View Post

in a perfect world you would have
/usr/lib64/haswell/libopenblas_sandybridge-r0.2.20.so
/usr/lib64/haswell/libopenblas_haswell-r0.2.20.so
/usr/lib64/haswell/libopenblas_broadwell-r0.2.20.so
/usr/lib64/haswell/libopenblas_skylake-r0.2.20.so

we live in a non-perfect world with intel cheaters, so we have haswell-optimized libs disguised as generic avx2-enabled ones

**F.Ultra** · 13 January 2019, 09:50 AM

Originally posted by birdie View Post

This thread is giving me so many facepalms I'm just gonna quote Gentoo's wiki (from the people who actually care about overall system optimization):

-O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.

And what part of "it would mean a bug in GCC" would not match your whole rant?

**F.Ultra** · 13 January 2019, 10:01 AM

Originally posted by hreindl View Post

no, it means that bigger code which don't gain enough perfromance boost of the optimization is trashing your precious caches which you can't even measure with a naive benchmark because it has side-efefcts to eahc and every process running on your CPU - in the worst case your code even don't fit in the caches at all and if often needed code is permanently trahsed and loaded again the bigger size makes it even worser because the reload is more expensive

The thing is that there was this whole "O3 just makes the code big and bloated so everyone should compile with Os because that is the only one that will make your code actually run faster" "truth" going around some time ago and it have lingered for decades. Not a single person provided any sort of proof back then so it was all assumed.

Carefully measuring the performance of my own projects I always ended up with O3 being faster, every single time, every single project. Which of course have nothing what so ever to do with how every other piece of code out there will behave (this is all anecdotal after all). But at least to me it sounds like a bug in GCC if applying more optimizations result in less optimized code, now of course this is perhaps the result of people not providing profiling data to the optimizer so that it can see if unrolling all those loops are really worth it or not which is one of the things that can create bloated code.

And then we enters the multiprocess problem that you mention and then all bets are off what really constitutes a optimized program since the interaction changes everything in a random way anyway (I'm just lucky to run most of my stuff on servers where the application usually have dedicated cores).

**birdie** · 13 January 2019, 10:31 AM

It's astonishing how much shit-posting is going behind my back.

First of all, I've never claimed -O2 is universally better than -O3. I said that -O3 might produce worse code, since it's bloated, which is bad specially for embedded devices with a smaller cache. No one in this this thread has given any arguments to the contrary. No results have been shared. It's crazy isn't it? Meanwhile embedded devices do often use -Os.

Secondly, even this article shows that -O2 is at times faster than -O3 (again barring -march=native which is not suitable for Linux distros). This is also corroborated by Gentoo's wiki which says that -O3 mustn't be used indiscriminately.

Thirdly, -Os and -O3 are completely different things. -Os pessimizes the generated code in order to keep binaries leaner while -O3 uses every optimization possible (including experimental ones) to sometimes detriment and it always generates more bloated code, and bloated code causes cache misses and in general contains more instructions for the CPU to run, so they might run slower.

Keep liking your own shit posts without zero evidence.

As far as I can see maybe two or three people in this entire thread have ever compiled anything. All others are parroting whatever shit they've heard, and they naively believe that 3 is bigger than 2, thus it's always better. I've seen many idiots on the Internet who compile programs in Linux using -O4/-O5/-O6 thinking there are undocumented even better optimization levels. Exactly what can be seen in this thread.

Announcement

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment