Announcement

**coder** · 26 September 2022, 04:06 PM

Originally posted by MadCatX View Post

Let me see if I got this right. Intel introduces AVX512, limits it to super-duper expensive server-grade CPUs and makes it a tricky tradeoff between performance, power consumption and performance-per-watt.

That was true of their 14 nm products. However, Ice Lake (10 nm+) then brought AVX-512 to laptops and Tiger Lake (10 nm++) continued this trend.

Rocket Lake then brought it to the desktop, but that was still 14 nm, because Intel's 10 nm++ process still couldn't deliver high clock frequencies or the yield needed for desktop products.

It's not until Alder Lake that Intel removed AVX-512 from consumer desktop & laptop platforms, and that was largely seen as a necessary compromise of their decision to go Big.Little. Some people debate whether it was truly necessary, but show me another Big.Little CPU with heterogeneous ISA.

**coder** · 26 September 2022, 04:10 PM

Originally posted by MorrisS. View Post

So AVX5 yields a significant increase.

Calling it that implies there was an AVX3 and AVX4. No, it's called AVX-512. It's inconsistent, but the reality of AVX-512 is that it's a different family of ISA extensions, just like AVX/AVX2 are a different family than the SSE/SSE2/SSE3/SSE4x extensions. Each family differs from its predecessor in more ways than mere register width.

**MadCatX** · 26 September 2022, 04:30 PM

Originally posted by coder View Post

That was true of their 14 nm products. However, Ice Lake (10 nm+) then brought AVX-512 to laptops and Tiger Lake (10 nm++) continued this trend.

True, but AVX512 in very power-constrained laptop chips is probably worth less that an "AVX512-ready!" sticker you could slap on a laptop with such a chip.

Originally posted by coder View Post

Rocket Lake then brought it to the desktop, but that was still 14 nm, because Intel's 10 nm++ process still couldn't deliver high clock frequencies or the yield needed for desktop products.

It's quite difficult to draw some exact conclusions but RKL's AVX512 seems worse than Zen4 at least in perf-per-watt. (https://www.phoronix.com/review/rocket-lake-avx512)

Originally posted by coder View Post

It's not until Alder Lake that Intel removed AVX-512 from consumer desktop & laptop platforms, and that was largely seen as a necessary compromise of their decision to go Big.Little. Some people debate whether it was truly necessary, but show me another Big.Little CPU with heterogeneous ISA.

I'm aware of the technical intricacies here but nevertheless it's a step back. Unless Intel fixes this for Raptor Lake, they could be at risk at looking even worse in some benchmarks if more consumer-grade software adopts AVX512. Now that we have chips with "no compromises" AVX512 implementations, it would make more sense to finally do that.

**coder** · 26 September 2022, 04:43 PM

Originally posted by MadCatX View Post

True, but AVX512 in very power-constrained laptop chips is probably worth less that an "AVX512-ready!" sticker you could slap on a laptop with such a chip.

It really depends on your workload. If you're running an AVX-512 heavy workload, then it was always a performance and efficiency win! Even on 14 nm, and even in spite of the down-clocking!

Where AVX-512 got into trouble was in workloads that used it for around 10% - 20% of the instructions, which was enough to trigger significant downclocking but not enough that it could compensate with its greater throughput. I experienced this, first hand. When we recompiled with AVX-512 completely disabled, we got higher overall throughput in my software.

At 10 nm, the power & clock penalties of AVX-512 are definitely less than they were at 14 nm. That was shown with Ice Lake SP Xeons delivering more consistent clock speeds in AVX-512 -heavy workloads, without going outside their power envelopes (server CPUs are very good about adhering to their power limits).

Originally posted by MadCatX View Post

It's quite difficult to draw some exact conclusions but RKL's AVX512 seems worse than Zen4 at least in perf-per-watt. (https://www.phoronix.com/review/rocket-lake-avx512)

You realize you're comparing an Intel 14 nm CPU with a TSMC N5 one, right? Rocket Lake's efficiency was always a joke. A very bad joke. To make matters worse, they solved the AVX-512 clock penalty by giving it an extremely high power budget. However, I think it's also a single-FMA design (somebody correct me if I'm wrong about that). So, power consumption was atrocious and performance wasn't even all that great.

Originally posted by MadCatX View Post

I'm aware of the technical intricacies here but nevertheless it's a step back. Unless Intel fixes this for Raptor Lake,

They won't. I've seen zero indication Raptor Lake will enable AVX-512 in any of its cores. The soonest it could return is probably Meteor Lake.

**MorrisS.** · 26 September 2022, 05:24 PM

Originally posted by coder View Post

Calling it that implies there was an AVX3 and AVX4. No, it's called AVX-512. It's inconsistent, but the reality of AVX-512 is that it's a different family of ISA extensions, just like AVX/AVX2 are a different family than the SSE/SSE2/SSE3/SSE4x extensions. Each family differs from its predecessor in more ways than mere register width.

So AVX512 yields a significant increase.

**chuckula** · 26 September 2022, 05:49 PM

Originally posted by Sin2x View Post

You've been a fan of an instruction set? What's wrong with you?

Obligatory Linuses quote: https://www.realworldtech.com/forum/...rpostid=193190

I never said I was a fan of an instruction said, I said I was a fan of AVX-512, which is such a strong architecture that even in its diluted consumer form it's one of the major performance wins for Zen 4. Seriously, turn off AVX-512 in Phoronix's main review and watch Zen 4 suddenly not look that great compared to a year-old Alder lake.

As for Linus's opinions on hardware, he's not a hardware engineer and 20 years ago when he was writing emulators at Transmeta -- which is the closest he ever got to hardware -- it didn't work out too well. The fact that some AVX-512 instructions aren't intended to swizzle bits inside the kernel doesn't mean they have no value, and the very article you should have read and comprehended proves that point abundantly.

**quaz0r** · 26 September 2022, 06:46 PM

"there is significant performance uplift to enjoy while no negative impact in terms of reduced CPU clock speeds / higher power consumption"

oooooh boy i couldnt wait to load the comment section and enjoy the obligatory 30 pages of "omgoptimized trollolol!!!" autism tears. come on lads, yall is slippin. where the tears at? 🤔

**coder** · 26 September 2022, 06:59 PM

Originally posted by chuckula View Post

Seriously, turn off AVX-512 in Phoronix's main review and watch Zen 4 suddenly not look that great compared to a year-old Alder lake.

Pfft. Nonsense. There are plenty of benchmarks where 7950X stomps the i9-12900K that have nothing to do with AVXn, such as the compile benchmarks.

AVX-512 really only helps in the minority of workloads, which is the main reason AMD dragged its feet for so long on it.

**ms178** · 26 September 2022, 07:18 PM

Originally posted by coder View Post

AVX-512 really only helps in the minority of workloads, which is the main reason AMD dragged its feet for so long on it.

Right but wasn't AVX-512 particularly better suited for more wide-spread use than other vector ISAs before it? I think the ispc creator blogged about that quite extensively, praising AVX-512 for its usefulness for general purpose compute tasks and performance per area advantages. Well, it took AMD more than half a decade to implement it and even though AMD was touting GPUs as better suited for vector code, that effort hasn't materialized yet and I am still waiting for them fulfilling their promises since 2012.

**schmidtbag** · 26 September 2022, 09:10 PM

Originally posted by bridgman View Post

It's not half-baked, it's half-sized and perfectly baked

In fairness we do also have multiple FP execution ports/pipes (6 vs 3 for Golden Cove) so there can still be a lot of work happening in parallel.

Haha fair enough - half-baked was more negative than I intended it to be, but I guess my point was it wasn't a "complete" AVX 512 implementation. Personally, I prefer AMD's route, since it yields good performance without making the die so huge and expensive.

Announcement

AMD Zen 4 AVX-512 Performance Analysis On The Ryzen 9 7950X

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment