Announcement

**coder** · 29 September 2022, 02:15 PM

Originally posted by AdrianBc View Post

AVX-512 implementation than in the majority of the Intel CPUs (with the exception of Xeon Platinum and similar overpriced Intel SKUs).

Even Platinum Xeon SP CPUs have serious problems with AVX-512 and clock-throttling. That's addressed somewhat in the Ice Lake generation, though I can't say how much. I expect Sapphire Rapids to be even better, but we'll see.

Throughput-wise, the dual-FMA Xeon models probably do outperform Zen4 in single-thread AVX-512 performance, on AVX-512 heavy workloads. In mixed and highly-threaded workloads, clock-throttling would probably impede the Xeons too much. Perhaps Sapphire Rapids will retake the lead here, as well.

Originally posted by AdrianBc View Post

Rewriting or recompiling programs to use AVX-512 can give a very nice boost to many applications, and it is a much more pleasant instruction set to program in than the previous crippled instruction sets implemented by Intel, i.e. MMX, SSE and AVX.

Pleasant? I didn't find anything unpleasant about the SSE code I've written, but if you need to do things like scatter/gather, then it's indeed painful to do with them. At least lane-swizzling got a lot better than the old days of MMX.

**WannaBeOCer** · 29 September 2022, 02:22 PM

Originally posted by qarium View Post

https://gpu.userbenchmark.com/Compar...3933vsm1850973

it looks like if you do not use raytracing the vega64 beats the A770...

and right now intels compute stack is not ready and ROCm/Hip works for the Vega64...

"only uses 180w while a RX Vega 64 uses around ~290w"

right but there is also a price difference .,.. and if you do not use raytracing the A770 looks like to be a bad choice.

maybe if you need the 16gb vram for compute then you maybe get a good deal.

Userbenchmark is a joke and if you notice many of the test are blank for the A770 because no one that’s not under embargo has one yet. Regarding regular rasterization they already announced that in DX12/Vulkan their A750 trades blows with the RTX 3060 while DX11 will be slower. They’re going to be in a similar situation to AMD’s GCN where their DX11 driver was ass then fanboys starting calling it fine wine. There’s no point in buying a Vega/RDNA1 GPU now since neither of them support DX 12 Ultimate meaning lack of mesh shaders.

Originally posted by atomsymbol

It is misleading because if I bought a Ryzen 7000 CPU then I would not be experiencing such power consumption, nor such power-efficiency, numbers when using my machine even if I decided to run the same apps as the apps that were tested in those Gamer's Nexus Youtube reviews.

Secondly, I hope you do understand that the purpose of their testing methodology is to reduce the statistical variance of their measurements - their testing methodology is not there to reproduce mine or your home/office conditions. If you believe that their testing environment is build to reflect actual home and office environments/conditions then you are mistaken. ---- Unless you happen to have a stable supply of liquid nitrogen to your home/office.

The power-efficiency numbers published on AMD's (or Intel's) slides are actually much more relevant for home/office use cases than what GamersNexus is reporting, if those slides contain power-efficiency curves comparing current-generation CPUs to previous-generation CPUs.

I am sure that NASA Jet Propulsion laboratory has very advanced testing methodologies as well ---- but the applicability of those methodologies to home and office environments is questionable.

You're trolling right? Every review I've seen that uses all the cores shows a 7950X using 240w+ in real world workloads. If you're only using your PC to game/office task then you're buying the wrong chip. What did you think was going to happen when you crank up frequency?

https://www.techpowerup.com/review/a...-7950x/24.html

**coder** · 29 September 2022, 02:37 PM

Originally posted by AdrianBc View Post

for such tasks the Intel efficient cores have a performance similar to that of a thread of a big Intel or AMD core.

LOL, wut?

No, they have only about 60% the integer performance of a P-core running 1 thread. Where the E-cores are faster is to load them instead of putting a second thread on a P-core.

Originally posted by AdrianBc View Post

On the other hand, for scientific computing or other floating-point applications or any other applications that can benefit from using AVX or AVX-512, Zen 4 will beat easily Raptor Lake, because the Intel efficient cores are weak at AVX.

Don't underestimate them. They're about 54% as fast in FP workloads as a single-threaded P-core. So, the amount of throughput they add is significant, if not huge.

Originally posted by AdrianBc View Post

which have prevented Intel to have the best support for an instruction set that they have developed more than 14 years ago (the first public description of the first variant of AVX-512 was in 2008, 3 years before Sandy Bridge, which used the inferior AVX instruction set).

Sandy Bridge was a 32 nm CPU and it didn't even implement AVX at full 256-bit width. I think they didn't do that until Haswell, which used 22 nm. And Haswell had an infamous clock-throttling issue with AVX2-heavy workloads, although it pales in comparison to the AVX-512 clock throttling problems Intel had on the 14 nm CPUs where they introduced it.

My point is that what you're talking about is a low-clocked, in-order Larrabee core. You cannot compare that to a high-clocked out-of-order, general-purpose CPU core. Even 2016 was too soon for Intel to deploy AVX-512 on general-purpose cores @ full width. It was a big mistake, due to all of the clock-throttling problems it caused. Possibly 10 nm ESF (AKA "Intel 7") is the first time it really makes sense.

**qarium** · 29 September 2022, 02:37 PM

Originally posted by coder View Post

You're joking, right? Their "poor showing" has them beating Alder Lake i9-12900K by 22.9% (geomean):

https://openbenchmarking.org/embed.p...a=19d1d59ab461

And while lowing launch prices vs. previous generation & maintaining the same average power consumption vs. 5950X! Intel will not be able to say the same!

o please let scottishduck alone ... poor intel people they suffer from trauma now if they believe 12900K is still gold let them buy it.

you will see event he 13900K will not be in the same liga than ryzen7000... because you pay extra to have less cores with similar performance compared to 13900K who has more cores and also you pay extra to not have little.big design...

little.big design is still a failure (with game main engine thread landing on E cores instead of P cores and stuff like that)

and about more E cores its a joke its only designed to win benchmarks but real world workloads profit from having LESS cores.

so any smart person will buy ryzen7000 instead of 13900K and 12900K...

**coder** · 29 September 2022, 02:47 PM

Originally posted by WannaBeOCer View Post

At the end of the day Intel screwed up by removing AVX-512 from consumer hardware due to haters of the instruction set for example Linus Torvalds:

That has nothing to do with their reasons for disabling it on Alder Lake.

Originally posted by WannaBeOCer View Post

When it comes to the actual workload I doubt AVX-512 workloads are more efficient on a non-native implement of AMD's which is "double pumping" while Intel uses a native implementation of AVX-512.

There's a little-known penalty of mixing in AVX-512 instructions into a program primarily using 256-bit width or smaller. That's because the CPU suddenly has to start updating the fields of vector registers above 256-bits, since you might use them. I suspect this could be dramatically lessened on AMD's implementation, especially if they use separate 256-bit physical registers for each half of the 512-bit ISA registers.

Originally posted by WannaBeOCer View Post

I have a 12700K from the first batch which still supports AVX-512 so if I do get my hands on a 7700X I'd definitely will test the two.

Expect to be disappointed. ms178 helpfully linked this analysis by Igor's Lab, in the Zen 4 AVX-512 thread. In it, Igor analyzed performance & power consumption of AVX-512 on Alder Lake, and it seems to me that it's suffering from perhaps some clock-throttling issues preventing it from really stretching its legs.

https://www.igorslab.de/en/efficienc...practice-test/

That said, I think Intel probably just didn't bother to optimize the clock frequency curves for AVX-512, once they decided they weren't going to enable it. I expect the implementation on Sapphire Rapids to be very competitive.

**qarium** · 29 September 2022, 02:53 PM

Originally posted by WannaBeOCer View Post

Userbenchmark is a joke and if you notice many of the test are blank for the A770 because no one that’s not under embargo has one yet. Regarding regular rasterization they already announced that in DX12/Vulkan their A750 trades blows with the RTX 3060 while DX11 will be slower. They’re going to be in a similar situation to AMD’s GCN where their DX11 driver was ass then fanboys starting calling it fine wine. There’s no point in buying a Vega/RDNA1 GPU now since neither of them support DX 12 Ultimate meaning lack of mesh shaders.

right userbenchmark is a joke but you know we do not have intel arc a770 benchmarks yet...
this means i work with the informations i have right now.
and right now it looks like my vega64 is faster than the a770 if you do not use raytracing and if you do not use the 16gb vram.

i don't know what you mean but right now it looks like if you have a vega64 like i have you have no or little reason to buy a arc a770...

**coder** · 29 September 2022, 03:36 PM

Originally posted by AdrianBc View Post

The only exception is that for the FMA instruction some of the most expensive Intel CPUs, with a price of thousands of dollars have a second 512-bit FMA unit, so only for FMA they have a double throughput when 512-bit instructions are used. This 2nd FMA unit is present only in all Xeon Platinum, a part of the Xeon Gold and in those of the Xeon W models that support AVX-512. It was also present in a few of the HEDT Intel CPUs.

In Ice Lake SP, Intel sought to overcome their disadvantage relative to Zen 3, in part, by enabling dual-FMA on all models. Granted, there are far more Skylake SP and Cascade Lake SP Xeons in service, currently.

Originally posted by AdrianBc View Post

All the Intel CPUs and AMD CPUs that support AVX-512 have exactly the same throughput: two 512-bit instructions per clock cycle.

The difference between the various models is only in the restrictions that may forbid both instructions executed in a clock cycle to be certain of the more complex instructions.

On most Intel CPUs, only 1 of the 2 instructions may be an FMA or an FADD.
On Zen 4, only 1 of the 2 instructions may be an FMA, but the other can be an FADD, so this is better than for most Intel CPUs with AVX-512.
On Intel Xeon Platinum and similar CPUs, both instructions can be an FMA.

Not only Zen 4 has a better throughput than the majority of the Intel CPUs by being able to do both an FMA and an FADD per cycle, but it has also a double throughput for certain kinds of shuffle and permute instructions.

You missed the part where Zen 4 has two further ports: store and store/F2I (float-to-int?). Also, I wouldn't be so dismissive of Zen 4's limited multiply/FMA throughput, especially considering my point about Ice Lake SP and upcoming Sapphire Rapids.

Originally posted by AdrianBc View Post

Zen 4 will continue to have a better AVX-512 implementation than most of the already existing Intel CPUs.

Well, you're comparing Zen 4 to Skylake-era cores. So, of course it's better than those. What's more interesting is to compare it with Sapphire Rapids' Golden Cove AVX-512. Do you know of any analysis of it, via Alder Lake?

**WannaBeOCer** · 29 September 2022, 03:37 PM

Originally posted by coder View Post

That has nothing to do with their reasons for disabling it on Alder Lake

Expect to be disappointed. ms178 helpfully linked this analysis by Igor's Lab, in the Zen 4 AVX-512 thread. In it, Igor analyzed performance & power consumption of AVX-512 on Alder Lake, and it seems to me that it's suffering from perhaps some clock-throttling issues preventing it from really stretching its legs.

https://www.igorslab.de/en/efficienc...practice-test/

That said, I think Intel probably just didn't bother to optimize the clock frequency curves for AVX-512, once they decided they weren't going to enable it. I expect the implementation on Sapphire Rapids to be very competitive.

It's a mix of two things, Gracemont doesn't support AVX-512 along with segregating the instruction set to HPC which is the reason they introduced VNNI-INT8 over the AVX unit for inference workloads to run on their hybrid architecture. I messed around with SLIDE for a bit but use my Titan RTX for training. https://github.com/keroro824/HashingDeepLearning

Aside from AI workloads and a single emulator I'm currently unsure the exact usefulness aside possibly a few more emulators according to the developer of RPCS3. Intel's Arc A770 16GB model with 512 of their tensor accelerator cores seem more interesting if I didn't already have a Titan RTX.

I'm not running into any throttling issues, I see all the cores pegged at 4.7Ghz at stock and 5.2Ghz with my OC with AVX-512 on Alder Lake and I also noticed AVX-512 uses less power than AVX2 on Alder Lake. Then again I'm on Asus' latest 2004 bios but still using microcode 15 to keep AVX-512 enabled.

**coder** · 29 September 2022, 03:42 PM

Originally posted by scottishduck View Post

The power efficiency claims are nonsense. Look at the reviews. The chips are also designed to intentionally hit a constant 95C under load. It’s a ridiculous design decision by AMD.

power efficiency != power consumption

That's the first thing. The second is that you have the option to easily run it at lower power thresholds (also, "Eco mode", which improves efficiency on lightly-threaded workloads?).

What's important is the relative performance of Intel vs. AMD when constrained to similar power envelopes. It's hard to fault AMD for answering Intel's runaway power consumption tactics. The key point is to note which solution gives the best efficiency within your chosen power tolerance.

**coder** · 29 September 2022, 03:44 PM

Originally posted by atomsymbol

clicking the DISLIKE buttons is all I can do from my position, and I don't intend to post comments to those Youtube videos.

And those of us without youtube accounts can't even do that much.

Announcement

Intel Announces 13th Gen "Raptor Lake" - Linux Benchmarks To Come

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment