Announcement

**pthariensflame** · 10 February 2024, 12:20 PM

Possibly worth noting that AMD is actually doing something Intel isn't twice here. The obvious one is PREFETCHI which is a "not yet" situation for Intel; as Michael noted, it won't show up for them until Granite Rapids. The more interesting one is AVX512-VP2INTERSECT, which was only found in Tiger Lake and Intel has halfway deprecated at this point; if AMD manages to revive it that'll put Intel in a very…intriguing position.

**skeevy420** · 10 February 2024, 12:29 PM

Over Zen 4, this confirms AMD Zen 5 as adding AVXVNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.

AVXVNNI. What about AVXVDDI and AVXVCCI?

**Namelesswonder** · 10 February 2024, 12:40 PM

VP2INTERSECT was flawed on Tiger Lake, which meant it was faster to emulate it than to actually use it.

Although that could just be typical Intel underbaking their implementation and only until their second attempt or AMD's implementation does it work better.

Also does AVX-VNNI even have any use over AVX512-VNNI? From Intel's own documentation it seems that they have the same CPI for their implementations, so AVX512-VNNI would have double the throughput.
Is AMD just making a microcode implementation that uses the same AVX512-VNNI instructions to offer AVX-VNNI?

Edit: It appears AVX-VNNI-INT* does add some more intrinsics over AVX512-VNNI, so it has use there, but AVX-VNNI standalone doesn't have any use over AVX512-VNNI.

**loganj** · 10 February 2024, 01:45 PM

i hope they keep lower power consumption compared to intel's cpus when it comes to AVX

**coder** · 10 February 2024, 01:56 PM

Originally posted by Namelesswonder View Post

Also does AVX-VNNI even have any use over AVX512-VNNI?

Probably the main reason they added it is to run optimized codepaths targeted at Intel hybrid CPUs, rather than a slower fallback path.

**coder** · 10 February 2024, 01:58 PM

Originally posted by loganj View Post

i hope they keep lower power consumption compared to intel's cpus when it comes to AVX

Agreed. I haven't heard any rumors about this, but the half-width implementation used in Zen 4 has been a real winner. I'll bet if they'd just add more FMA-capable execution ports, the performance gap between its AVX-512 and Golden Cove's would narrow enough to make it largely irrelevant.

I think it's interesting that ARM switched from 2x 256-bit SVE ports to 4x 128-bit SVE2 ports, between the Neoverse V1 and V2 cores. Perhaps it shows that execution width is no longer as important as once thought? Or, maybe it just has more to do with the amount of ARM code that still relies primarily on 128-bit NEON SIMD. Even in that case, AMD's 256-bit implementation is very amenable to AVX2, of which there's still a lot out there (and especially now that Intel removed AVX-512 from their client processors, with AVX10/256 looking set to come next).

**onlyLinuxLuvUBack** · 10 February 2024, 02:25 PM

Originally posted by Namelesswonder View Post

VP2INTERSECT was flawed on Tiger Lake, which meant it was faster to emulate it than to actually use it.

Although that could just be typical Intel underbaking their implementation and only until their second attempt or AMD's implementation does it work better.

Also does AVX-VNNI even have any use over AVX512-VNNI? From Intel's own documentation it seems that they have the same CPI for their implementations, so AVX512-VNNI would have double the throughput.
Is AMD just making a microcode implementation that uses the same AVX512-VNNI instructions to offer AVX-VNNI?

Edit: It appears AVX-VNNI-INT* does add some more intrinsics over AVX512-VNNI, so it has use there, but AVX-VNNI standalone doesn't have any use over AVX512-VNNI.

can somebody at intel just simplify things for buyers:
call it MMX ( mental metal x-tensions )
and add 2.0 , and then 2.1 and then 2.323

**coder** · 10 February 2024, 02:30 PM

Originally posted by onlyLinuxLuvUBack View Post

can somebody at intel just simplify things for buyers:
call it MMX ( mental metal x-tensions )
and add 2.0 , and then 2.1 and then 2.323

It's funny that the way they're going with AVX10 is just to use a linear versioning scheme, like you mentioned. Well, version + execution width.

**[deXter]** · 10 February 2024, 04:14 PM

Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.

Announcement

AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment