Announcement

**_Alex_** · 02 March 2019, 08:21 PM

Nice benchmarks, thanks for your time. It definitely wasn't for nothing, despite the non-dramatic results.

In theory I believe the Gentoo camp is right - although instead of having all these options perhaps they could use a -march=native for the intended machine. I am not using Gentoo, so I don't know if all these architectures are intended for filling repositories with binaries from each architecture (but from what I remember, Gentoo compiles on the fly for the the architecture of the host).

In practice, there are two reasons why the performance isn't there.

1) Native functionality cannot be exploited fully due to the kernel not allowing the use of SSE/AVX so that it doesn't have to save their register state back and forth all the time.

2) The GCC compiler probably not doing an excellent work with cache size differences (how can you not exploit the difference of a 64kb L1 instruction cache vs a 32kb L1 instruction cache?) or instructions that can actually be exploited like BMI/BMI2/ADCX/ADOX/MULX, etc. Some of these newer instructions suffer in terms of optimization and most of the time you have to hand-tune the assembly in performance-critical code.

For (1) maybe a very complex solution to this could be to have an algorithm that sees how much the SSE/AVX instructions will be used, then save SSE/AVX state of the XMM/YMM/ZMM registers that will be used, proceed to using SSE/AVX for the gains they'll give in a certain function and then restore the SSE/AVX registers upon finishing. For crypto stuff they are already doing this I guess, but without the algorithm. They just know it's faster and are using the SSE/AVX versions and just save/restore the state because it's worth it.

For (2) perhaps GCC could do a better work (?).

**Grogan** · 03 March 2019, 06:50 PM

Originally posted by Mario Junior View Post

Gentoo "muh compilation" is a waste of time and a meme. Clear Linux has the REAL optimization for a entire system. Benchmarks doesn’t lie.

Sure, if you have that specific hardware. 100% useless for everyone else. UEFI is even a requirement.

There's more to source based package management (e.g. Gentoo portage) than compiler optimizations. It's about having things the way you want them.

**Mel Spektor** · 03 March 2019, 07:46 PM

Originally posted by Lucien23 View Post

Thank you for the much needed benchmark on this. It's a bummer but knowledge is power in the end. Now I won't feel like I'm missing anything if I don't apply it.

I agree excellent article.

**Alejandro Nova** · 05 March 2019, 01:55 PM

Always

**Alejandro Nova** · 05 March 2019, 01:55 PM

Always relevant.

http://web.archive.org/web/200605130...oll-loops.org/

**deusexmachina** · 26 December 2019, 03:46 PM

Perhaps we could convince Michael to include Gentoo in the next set of Ryzen benchmarks?

**s_j_newbury** · 12 December 2020, 08:35 PM

Rather late to this one, but nobody pointed out the most obvious reasons (aside from the SIMD instruction selection being irrelevant) why the benchmarks failed to show any significant improvements: -

Current GCC default tuning is (“produce code optimized for the most common processors”), there is actual effort taken to ensure it works well with current chips from Intel and AMD, if not 100% optimal for either. Also, in recent years (read Zen) AMD has made an effort to have their CPUs work well with Intel optimized code. It wasn't always like. Tuning really became important with the Pentium4 where code would perform well on Intel or AMD which is why GCC gained the ability to tune for micro-architecture.

A better test would be for CPUs which don't fit into "most common processors" today, such as the AMD Bulldozer family, the aforementioned Pentium4 or previously mentioned Atom (Bonnell).

Announcement

The Performance Impact Of GCC CPU Tuning On The Linux Kernel's Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment