Announcement

**scottishduck** · 29 September 2022, 03:52 PM

Originally posted by qarium View Post

o please let scottishduck alone ... poor intel people they suffer from trauma now if they believe 12900K is still gold let them buy it.

you will see event he 13900K will not be in the same liga than ryzen7000... because you pay extra to have less cores with similar performance compared to 13900K who has more cores and also you pay extra to not have little.big design...

little.big design is still a failure (with game main engine thread landing on E cores instead of P cores and stuff like that)

and about more E cores its a joke its only designed to win benchmarks but real world workloads profit from having LESS cores.

so any smart person will buy ryzen7000 instead of 13900K and 12900K...

What I have is a 5950x.

What I don’t have is weird tribalistic feelings about corporations and pieces of silicon.

**coder** · 29 September 2022, 04:06 PM

Originally posted by WannaBeOCer View Post

You're trolling right? Every review I've seen that uses all the cores shows a 7950X using 240w+ in real world workloads. If you're only using your PC to game/office task then you're buying the wrong chip. What did you think was going to happen when you crank up frequency?

https://www.techpowerup.com/review/a...-7950x/24.html

As already mentioned many times, 7950X delivers very strong performance at lower power thresholds. It remains a perfectly viable & competitive solution in such configurations. Whether the same can be said of Raptor Lake remains to be seen, but I wouldn't count on it.

Also, I'd like to see their raw measurement data. Specifically, how much of the time did the benchmarks which ran > 200 W stay at such elevated levels?

**coder** · 29 September 2022, 04:11 PM

Originally posted by WannaBeOCer View Post

I'm not running into any throttling issues, I see all the cores pegged at 4.7Ghz at stock and 5.2Ghz with my OC with AVX-512 on Alder Lake

Well, then Igor's data is all the more perplexing, because that's the best explanation I had for why AVX-512 wasn't delivering better performance.

Originally posted by WannaBeOCer View Post

and I also noticed AVX-512 uses less power than AVX2 on Alder Lake.

Right. That's what he found, and it makes very little sense.

**piotrj3** · 29 September 2022, 06:44 PM

Originally posted by coder View Post

This was true until Zen 3. Once Zen 3 happened, Intel actually had to raise clock speeds & power consumption of its 14 nm CPUs even to compete in single-threaded performance!

That held until Alder Lake, which enabled Intel to comfortably regain the single-threaded lead, although they seemed reluctant to take their foot off the gas (i.e. clock speeds).

Leaving aside the issue of the E-cores, let's stay focused on generational power-efficiency improvements. AMD delivered this:

So, their fundamental efficiency indeed improved. This will be virtually impossible for Intel to do in Raptor Lake, because they have the same microarchitecture being made on virtually the same process node. So, fundamental efficiency will not drastically change.

We can also see that AMD traded some of those efficiency gains for better performance, by increasing clock speeds. Intel will do the same. However, by not starting from a lower base like AMD, Intel's single-threaded efficiency can pretty much only get worse, in Gen 13. If they kept the same clocks as Gen 12, then we could see some small improvement, but they've already said they won't.

The main place where Raptor Lake can possibly lower power consumption is in workloads with about 24 threads, because half of those threads will now move to the additional E-cores instead of over-taxing the 8 P-cores. In all-core workloads, the throughput added via 8 additional E-cores should actually enable better perf/W than Alder Lake. The pity is that power consumption of such workloads is so very high, due to their aggressive clocking.

However, it's incorrect to say that Raptor Lake is chiefly about improving power-efficiency. If that were true, they wouldn't be increasing clock speeds, as well. What Intel is doing with Raptor Lake is to look for performance gains anywhere they can find them. Faster clock speeds, bigger L2 cache, faster DDR5, and more E-cores. It's all really about performance.

That's not really true. AMD's APUs were much more power-efficient. The 5800X was an outlier, in terms of power-efficiency for the 5000-series.

If their next-gen APUs remain monolithic, then I think it'll be a similar story. However, the penalty Ryzen 7000's MCM architecture should be lower, now that the I/O Die is 6 nm (in the 5000 series it was either 14 nm or 12 nm).

Don't use AMD pictures for that. GamersNexus made video in depth analyzing power consumption of 7950X, and they found that power consumption is extremly high as long as you can cool down chip (its power draw is optimized not towards power draw but towards reaching 95C). So 7950X can take on average... 251W just on EPS rail ON AVERAGE during blender if your cooling allows that. This is why Ryzen 7950X is so bad chip to rate because one reviewer will claim 190W power draw, another 220W another 250W just on EPS rail. And all of them will also have diffrent performance claims. Meanwhile Intel on 241W alder lake was very representative - power draw is capped if your cooling allows this power draw you will have same performance as reviewer. https://youtu.be/nRaJXZMOMPU?t=541

11:51 you have broken efficiency promises.

**coder** · 29 September 2022, 09:34 PM

Originally posted by piotrj3 View Post

Don't use AMD pictures for that. GamersNexus made video in depth analyzing power consumption of 7950X, and they found that power consumption is extremly high as long as you can cool down chip (its power draw is optimized not towards power draw but towards reaching 95C).

His measurements don't refute their claims. You need to understand that power consumption is not the same thing as power efficiency, and that there's more than one way to run the CPU.

Originally posted by piotrj3 View Post

So 7950X can take on average... 251W just on EPS rail ON AVERAGE during blender if your cooling allows that.

Alder Lake will do the same thing, on gaming boards. Intel allows it to stay in boost mode indefinitely, so the boost duration is ultimately limited by your cooling solution.

Originally posted by piotrj3 View Post

11:51 you have broken efficiency promises.

If people want it to run efficiently, they just need to select the desired TDP and optionally Eco mode. Alder Lake doesn't even give you that option.

I find it funny that people are up in arms about this. It's a race to the bottom scenario. I don't get why you somehow expect AMD to take "the high road", when it would mean losing market share to an even less-efficient Intel CPU. As long as Intel is playing these games, AMD has no choice but to respond.

**WannaBeOCer** · 29 September 2022, 11:18 PM

Originally posted by coder View Post

As already mentioned many times, 7950X delivers very strong performance at lower power thresholds. It remains a perfectly viable & competitive solution in such configurations. Whether the same can be said of Raptor Lake remains to be seen, but I wouldn't count on it.

Also, I'd like to see their raw measurement data. Specifically, how much of the time did the benchmarks which ran > 200 W stay at such elevated levels?

According to Intel, the 13900K provides the same performance as the 12900K at 64w. From early leaks the 13900K already outperforms the 7950X in synthetic benchmarks and use about the same power. With 100w more we’re going to see a 6Ghz Raptor Lake. While the mid-range 13700K/13600K will mostly be a decent amount ahead of the 7700X/7600X. I wouldn’t be shocked when AMD lowers the price of the 7900X to $450 to compete with the 13700K.

Intel's Core i9-13900KS Rips Ryzen 9 7950X In Early Benchmark

https://www.tomshardware.com/news/intels-core-i9-13900ks-rips-ryzen-9-7950x-in-early-benchmark

Intel's next-generation Special Edition CPU has a temper.

**AdrianBc** · 30 September 2022, 04:05 AM

Originally posted by coder View Post

Well, you're comparing Zen 4 to Skylake-era cores. So, of course it's better than those. What's more interesting is to compare it with Sapphire Rapids' Golden Cove AVX-512. Do you know of any analysis of it, via Alder Lake?

At the end of the page

https://www.mersenneforum.org/showthread.php?p=614191

there is the table with the measured throughputs and latencies for Zen 4

https://www.mersenneforum.org/attachment.php?s=f1dc93fd127049112e392d08f2878e82&attachmentid=27361&d=1664197259

The same throughput and latency table for an Alder Lake with enabled AVX-512 is at

http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel0090672_AlderLake_BC_AVX512_InstLatX64.txt

In general Zen 4 has either the same or better throughputs and latencies in comparison with Sapphire Rapids, but there are 2 important exceptions.

As mentioned before, Sapphire Rapids will have two 512-bit FMA units, thus double throughput for FMA.

Besides that, Sapphire Rapids will have an approximately double throughput for the gather instructions.

The same double throughput for gather is also valid for the AVX2 variant of gather, i.e. for Raptor Lake/Alder Lake when running AVX2 code vs. Zen 4.

Also, it appears that Zen 4 had a bug in the vpcompressd instruction, only for the case when the destination is in the memory, and that bug was discovered late, so it was patched with a microcoded sequence.

Because of that, on Zen 4 vpcompressd with a memory destination is abnormally slow, even if it is fast with a register destination and vpexpand is fast even with a memory operand.

So an AVX-512 program intended to run on Zen 4 should replace vpcompressd with a memory destination with an equivalent instruction sequence using vpcompressd with a register destination.

**AdrianBc** · 30 September 2022, 07:02 AM

Originally posted by coder View Post

LOL, wut?

No, they have only about 60% the integer performance of a P-core running 1 thread. Where the E-cores are faster is to load them instead of putting a second thread on a P-core.

You have just said exactly the same thing that I have said.

When both threads are active on a P-core, each of them has about 60% of the performance of the same core with only 1 active thread, so both threads increase the performance of the core to about 120%. Therefore a thread on an E-core has about the same performance as that of one of the 2 threads on a P-core with both threads active.

When all the available threads are active on an Alder Lake or Raptor Lake, the SMT threads on the P-cores and the single threads on the E-core have about the same performance, so a Raptor Lake with 8 x 2 threads on P-Cores + 16 x 1 threads on E-cores has about the same performance as a CPU with 16 x 2 threads on P-cores, but at a smaller area and power consumption.

This is not a coincidence. The Intel designers are not stupid so they have chosen this performance ratio so that the speed will not vary wildly when the threads happen to be migrated between cores by the operating system scheduler.

While you are right that when only a part of the threads are active, it is always better to start a thread on an idle E-core instead of the second thread on a P-core, most programs either use only a few threads running on few of the P-cores, or they use all the available threads, 1 thread on each E-core and 2 threads on each P-core, and in the latter case all threads have similar performance.

Originally posted by coder View Post

Sandy Bridge was a 32 nm CPU and it didn't even implement AVX at full 256-bit width. I think they didn't do that until Haswell, which used 22 nm. And Haswell had an infamous clock-throttling issue with AVX2-heavy workloads, although it pales in comparison to the AVX-512 clock throttling problems Intel had on the 14 nm CPUs where they introduced it.

My point is that what you're talking about is a low-clocked, in-order Larrabee core. You cannot compare that to a high-clocked out-of-order, general-purpose CPU core. Even 2016 was too soon for Intel to deploy AVX-512 on general-purpose cores @ full width. It was a big mistake, due to all of the clock-throttling problems it caused. Possibly 10 nm ESF (AKA "Intel 7") is the first time it really makes sense.

Sandy Bridge had a full 256-bit width implementation for the floating point instructions, including for multiplication and addition, which matter most for the power consumption.

Haswell added 256-bit implementations for the integer instructions and it also replaced the multiplier and adder of Sandy Bridge with two FMA units, which double the computation throughput, but it also doubled the power consumption, causing the down-clocking problems that you mention.

The AVX instruction set added only minimal improvements over SSE, except for extending the registers to 256 bits and allowing 3-address instructions instead of 2-address instructions.

The Larrabee New Instructions, later renamed as AVX-512 was a completely new instruction set that was much better designed than MMX/SSE/AVX.

AVX-512 has nothing to do with the width of the execution units, which determines the power consumption that can cause down-clocking problems. You can implement AVX-512 even with 64-bit wide execution units in a very cheap implementation.

AVX-512 has nothing to do with whether the CPU has in-order or out-of-order execution.

For lower cost in Sandy Bridge, it would have been very easy to implement only the 256-bit versions of the AVX-512 instructions and with only 16 registers, at a cost very close to that of the AVX implementation, but having a much simpler path for the future extension of the ISA.

The choice between AVX and the Larrabee New Instructions for Sandy Bridge had absolutely nothing to do with the technical merits of the 2 instructions sets. Those 2 instruction set extensions have been designed in parallel by different Intel teams, working in different continents. It is pretty certain that there was no adequate communication between the different Intel teams and that the relationships between the teams were more of competition than of cooperation.

So it is likely that for the A team it would have been seen absurd to discuss with some secondary team about merging their possibly better design into the Sandy Bridge project, instead of developing their own ISA extension, independently of other teams, even if the NIH approach resulted in an inferior ISA.

A couple of years later Haswell has added a few of the instructions provided earlier by Larrabee and Knights Corner, e.g. fused multiply-add and gather instructions, but due to the initial design of AVX it was impossible to add the most important AVX-512 features, like the mask registers.

**piotrj3** · 30 September 2022, 08:04 AM

Originally posted by atomsymbol

They made a video about power consumption of 7950X - but they didn't make an in-depth video. An in-depth analysis of CPU power-efficiency would look somewhat different.

I think you don't quite get/understand it. If you take the cost of electricity into account (which you should; unlike GamersNexus), then the most efficient setup of running Blender on Ryzen 7000 is a single point: it is a point "X" that is the highest one on an ⋂-shaped curve. The probability that [the values (Watts, Amperes) attributed to X depend on whether the cooler can dissipate 250W or "just" 190W] is quite low, because the most cost&power-efficient way of running Blender is well below 250W.

Can you point me to the time where any GamersNexus Ryzen 7000 review video is showing the point X on such ⋂-shaped curve?

Your statement that ".... one reviewer will claim 190W power draw, another 220W another 250W" is true only because those reviewers don't know how to properly review the CPU so that most potential Ryzen 7000 buyers/users can find their use-case in that review, which has a primary cause in the fact that people watching/reading those reviews don't demand those reviews to be more complex.

Can you point me to the time where any GamersNexus Ryzen 7000 review video about which you can say "This point here: that will precisely be my use-case"?

Without taking costs into account, the best way of running Blender on Ryzen 7000 is to use liquid nitrogen to cool the CPU.

The issue it is subjective. Anyway my issue is that AMD changed definition of their TDP (what video exactly mentions). Before TDP was actually the power your CPU drew from EPS rail. Now it is something else. Intel meanwhile implies 2 things one is base power draw and boost power draw and with exception of some AVX512 workloads you will not break that boost power draw.

So you have Intel 12900k https://ark.intel.com/content/www/us...-5-20-ghz.html
You see boost power draw 241W. You maybe as reviewer open Intel extreme tuning utility or some diffrent tool and you see if processor is thermal throttling. So if you don't see thermal throttling 12900k will perform almost exactly the same for Phoronix, GN, Linus Tech Tips, Anandtech, arctechnica etc. etc. even under diffrent coolers. Keep in mind i am talking about 12900k which has unilimted Power limit modes in terms of time. The only minor going over limit was in some AVX loads but that was minor going over limit (like 5%).

now you have 7950X, and power draws as well boost frequencies are total rollercaster among reviewers.

On AMD site you see one figure 170W.
Phoronix had max power draw of 230W, but max temp is 96C what implies throttling.
GN had 251W.
Hardware unboxed has 355W whole system power consumption (not easy to compare but 130W above Ryzen 5950X).
Linus Tech tips had 190W and yes throttling.

Another issue is that one reviewer before claimed 65W TDP 7950X outperforming 12900k problem was on same graph below package draw was 90W on 7950X (which is just 30W under maximum 5950X). And now Intel claims 13900k in multicore workload will have same performance in 65W mode as 12900k in 241W PL. So i am not certain in this generation if AMD will be more efficient. Because if Intel literally draws 65W on average to have such performance, then dialed down from 90W to 65W Ryzen will lose here a lot of performance. At that point i don't know.

**coder** · 30 September 2022, 05:10 PM

Originally posted by AdrianBc View Post

Also, it appears that Zen 4 had a bug in the vpcompressd instruction, only for the case when the destination is in the memory, and that bug was discovered late, so it was patched with a microcoded sequence.

Because of that, on Zen 4 vpcompressd with a memory destination is abnormally slow, even if it is fast with a register destination and vpexpand is fast even with a memory operand.

So an AVX-512 program intended to run on Zen 4 should replace vpcompressd with a memory destination with an equivalent instruction sequence using vpcompressd with a register destination.

Well, let's hope they get those cost tables updated accordingly, for zenver4 in gcc and llvm! It seems an easy substitution to simply use a temporary register target and then write out that. Won't help the inveterate assembly programmers, but the enlightened among us who use compiler intrinsics should hopefully not see much impact.

Any idea how likely they are to fix it in a future stepping?

Announcement

Intel Announces 13th Gen "Raptor Lake" - Linux Benchmarks To Come

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment