Announcement

**-MacNuke-** · 11 February 2022, 06:15 AM

To be fair, Ryzen CPUs don't have that powerful AVX implementations compared to the Intel CPUs. So it may not be that surprising that is does not perform as expected.

**KaoDome** · 11 February 2022, 06:35 AM

I wonder if what he described is experienced across different compilers (e.g., GCC, Intel, Microsoft's) or only Clang. He did say compilers, but if memory doesn't fail they're using Clang on Windows too.

**Setif** · 11 February 2022, 07:08 AM

The code must use array/vector operations heavily to get full advantage of AVX2+. You don't get always performance boost from AVX2.
Here is my experience when I was developing a code to do some computing for my graduation.
What I have observed is that when I tune my code with AVX2 (woth -mavx2) the CPU frequency decrease from 2.5GHz to 2.2 GHz with the first CPU and from 3.6GHz to 3.0GHz with the second which is approximately ~= -(12-17)%. If you have some array/vector operations that took for example 40% (4s) of runtime without AVX2, that time will decrease to 2s after tuning for AVX2 ~= +20% of the total runtime.
With a simple math, considering the runtime was 10s, the runtime after AVX2 = 10s * 1.15 * 0.8 = 9.2s ~ 8%
So unless you have a code that its array/vector operations took >35% of the total runtime you will get nothing from AVX2.

**uid313** · 11 February 2022, 07:20 AM

So Linus Torvalds was right about AVX-512.

Originally posted by Linux Torvalds

I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on.

**zxy_thf** · 11 February 2022, 07:30 AM

Originally posted by -MacNuke- View Post

To be fair, Ryzen CPUs don't have that powerful AVX implementations compared to the Intel CPUs. So it may not be that surprising that is does not perform as expected.

The testing of AVX performance on Zen 1 (Ryzen 5 2500U) has clearly demonstrated the developer has no real understanding about hardware and is very likely incapable to maintain the AVX2 code path as well. So it's good to remove them altogether.

**mlau** · 11 February 2022, 07:35 AM

Originally posted by zxy_thf View Post

The testing of AVX performance on Zen 1 (Ryzen 5 2500U) has clearly demonstrated the developer has no real understanding about hardware and is very likely incapable to maintain the AVX2 code path as well. So it's good to remove them altogether.

Ahh no. He's weighed the probable benefits against the added maintenance burden, and decided that the outcome is against the AVX code, simple as that. The avx code need to be improved a lot (better separation from rest of lo code, a maintainer, ...) and it will be added back.

**AHSauge** · 11 February 2022, 07:35 AM

This isn't surprising. Getting performance out of AVX is tricky when it's mixed with code that only uses SSE. The switch between the two (at least used to) result in a painful pipeline flush, which means that "randomly" enabling AVX results in lower performance. Thus in order to really gain anything from AVX, you need compute intensive work that is highly vectorised. From my POV, computation that only takes a few ms is laughable in this context. There's no way it's worth the pain at such small scale.

**M@GOid** · 11 February 2022, 08:00 AM

Originally posted by uid313 View Post

So Linus Torvalds was right about AVX-512.

To be fair, AVX-512 does have its use cases. Ian Cutress from Anandtech is a researcher mathematician that is quite fond of it, since he used it in his work. On the other hand, that also exemplify that actual use cases of AVX are rare and very specific, showing that it really don't have much use in consumer grade CPUs.

**GPSnoopy** · 11 February 2022, 08:41 AM

Originally posted by M@GOid View Post

To be fair, AVX-512 does have its use cases. Ian Cutress from Anandtech is a researcher mathematician that is quite fond of it, since he used it in his work. On the other hand, that also exemplify that actual use cases of AVX are rare and very specific, showing that it really don't have much use in consumer grade CPUs.

I would seriously take what Ian Cutress says about AVX512 with a truck load full of salt. I've confronted him about his AVX512 tests and the code he's using (3D particles movements from his own PhD thesis), but he's been very evasive in his explanations. The code is closed source, its credibility purely resting on Ian's own words, and the results usually don't make sense (i.e. some of the speed up cannot be explained by the doubling of the vector size alone and implies he's actually benchmarking specific instructions that do not exist in AVX2, or worse, comparing apples to oranges by proxy of completely different algorithms).

From my own professional experience, I would opinionate a few judgements about AVX512:

AVX512 is borderline useless. It's nowhere near ubiquitous as AVX2 and does still incur a heavy frequency throttling, especially if you consider that the other CPU cores are likely used by other jobs / containers that will also pay the price for another process using AVX512 on the same piece of silicon.
AVX512 is thus only useful if the applications is already heavily vectorised and is running multithreaded on all the CPU cores (this point makes judging the performance impact of the frequency throttling easier, since it's self contained to just one application).
If the application is already (or wants to be) heavily vectorised and multithreaded, then AVX512 is actually a worse option. GPUs (and CUDA specifically) are better alternatives (I would say that empirically, on comparably priced hardware and power consumption, GPUs are usually 7x faster than equivalent CPUs). From 3D offline rendering to AI, nobody of sound judgment uses AVX512 instead of GPUs unless they work in one of the many Intel marketing departments.
If you're still going to do AVX512 programming, for the love of all past and future deities, please do use ISPC (https://ispc.github.io/) and do not manually implement a separate code path for each instruction sets (bonus: ISPC works on ARM Neon too).

Announcement

AVX/AVX-512 Tuning Doesn't Payoff For LibreOffice's Calc Spreadsheets

AVX/AVX-512 Tuning Doesn't Payoff For LibreOffice's Calc Spreadsheets

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment