Announcement

**pieman** · 04 February 2024, 02:33 AM

Originally posted by bezirg View Post

What sucks about e-cores is that they are the reason that AVX-512 is disabled/missing in alderlake/raptorlake cpus.

intel realized this mistake and bringing it back with arrow lake. intel did realize the mismatching isa was a weakness.

**drakonas777** · 04 February 2024, 05:02 AM

Originally posted by pieman View Post

intel realized this mistake and bringing it back with arrow lake. intel did realize the mismatching isa was a weakness.

You mean that AVX10.x workaround/hack, where E cores still won't support 512bit vector instructions? Because I have not heard anything about AVX512 on Arrow Lake. Though AVX10.x should come after Arrow Lake I think...

**mush** · 04 February 2024, 09:14 AM

Originally posted by Kjell View Post

Shine how?

E-Cores are a joke.

It's a cheap attempt of trying to catch up with the superior power efficiency of AMD.

Keep coping

Take a 13600k and a 13700k. Disable 13700k's ecores and set the same maximum watt, for example 150watt, on both of them. Except for gaming with a 4090, the 13600k would be much faster than the 13700k.
Without them, the fabs wouldn't magically become better than tsmc's. Instead of loosing in power efficiency they would loose in power efficiency and multi-core performance.

**coder** · 04 February 2024, 05:34 PM

Originally posted by ms178 View Post

I'd like to see benchmarks with AMD's implementation of their compact cores but for some reason they still haven't released such a SKU yet.

The new Ryzen 5 8500G fits in a AM5 socket, has a 65W TDP, and comes with 2x Zen 4 cores + 4x Zen 4c cores. It's a desktop version of the 7545U. Look for benchmarks on either one.

Originally posted by ms178 View Post

I want a great AVX-512 implementation and 8P + 12C cores that just work everywhere as intended. Neither company wants to give me that at the moment.

AMD is just dipping its toe in the hybrid CPU domain. I expect they'll continue to scale up C-cores, as time goes on. For now, they're limited to just monolithic die CPUs aimed at laptops, which means core counts are somewhat limited (Intel's P-series laptops CPUs also max at 6P + 8E cores, with Meteor lake adding an extra 2 LP E-cores).

**coder** · 04 February 2024, 05:38 PM

Originally posted by NobodyXu View Post

Thing is, adding CPU cores ktself does not add overhead, the overhead is merely from the software itself, the algorithm used.

Cache coherency adds overhead, and that's in the hardware. Furthermore, scaling core counts means a larger interconnect topology, which should increase memory latency and cross-NUMA latencies.

Originally posted by NobodyXu View Post

stuff like compilation is often linearly scalable.

You'd expect so, right? It's funny that I haven't seen Phoronix' compilation benchmarks scale very linearly. In fact, they have tended to scale much less linearly than certain other workloads.

Not sure where the bottleneck lies... my suspicions naturally go towards I/O, especially since some of these machines don't have all that much memory per-core.

**coder** · 04 February 2024, 05:56 PM

Originally posted by RealNC View Post

Everything I learned at school tells me that the more cores you have, the less the performance gains scale due to synchronization overhead. The more cores, the bigger the overhead. 8 cores with performance N are always faster than 16 cores with performance N/2.

Yes, super-linear speedups don't exist (or else you've done something wrong) and even truly linear speedups are at best theoretical.

That's not to say there aren't plenty of examples of near-linear scaling. It depends a lot on what you're doing. Graphics is the poster child for parallelism, which is how GPUs manage such impressive performance by distributing the work among tens of thousands of "threads".

Getting back to CPUs, have you not seen Michael's benchmarks of 2x 128-core AMD Begamo systems with 512 threads?

Source: https://www.phoronix.com/review/amd-epyc-9754-bergamo

No, it doesn't scale linearly over 2x 96-core Genoa, but then it runs at lower clock speeds and has half the L3 cache. Considering that, I'd say the results are quite impressive. Especially, when you consider it's using only 86.4% as much power, I'd certainly be satisfied with a 19.9% performance improvement, rather than 33.3%.

So, maybe think about that, before you rush to complain that E-cores are getting too numerous to scale well. I'm not saying the do scale as well as they might... it does feel to me like the E-cores were somewhat hastily bolted on to Intel's current hybrid CPUs.

**coder** · 04 February 2024, 06:06 PM

Originally posted by drakonas777 View Post

Even in code compilation the practical benefits of E cores are questionable,

I only have data on Alder Lake, but SPECint2017 includes a gcc benchmark and 8 E-cores scored 35.52, while 8 P-cores scored 70.47 (1 thread per core) and 75.10 (2 threads per core). That suggests 16 E cores could nearly equal the P-cores' performance at just over half the area and less power consumption.

If Raptor Lake's E-cores are nearly doubling its compilation performance at only about 50% more die area, I'd say it's a win. Wouldn't you?

**cj.wijtmans** · 04 February 2024, 06:08 PM

Originally posted by NobodyXu View Post

Funny because for server, they opt to use 64 cores with worse performance than desktop chip.

Thing is, adding CPU cores ktself does not add overhead, the overhead is merely from the software itself, the algorithm used.
It's true that for many algorithms there's a limit to the parallelism, hence why on desktop it's rare to see more than 32 cores, but stuff like compilation is often linearly scalable.

because servers have a lot of threads idling waiting for input(polling) that need to respond quickly with low latency, finish quickly for the next request. If you have few cores context switching all the time you will gain a greater inefficiency. It is more complicated than that as well but that is the most basics i can think of.

**NobodyXu** · 04 February 2024, 07:03 PM

Originally posted by coder View Post

Cache coherency adds overhead, and that's in the hardware. Furthermore, scaling core counts means a larger interconnect topology, which should increase memory latency and cross-NUMA latencies.

That't true, that does make everything related to atomics and syncrhonisation more difficult and expensive, and more difficult to design such CPUs.

Yet AmpereOne already releases a 192-core CPU for server, compared to that monstor, the overhead of adding more cores to Intel/AMD desktop cpus which has less than 32 cores is unlikely to be a problem.

Originally posted by coder View Post

You'd expect so, right? It's funny that I haven't seen Phoronix' compilation benchmarks scale very linearly. In fact, they have tended to scale much less linearly than certain other workloads.

Not sure where the bottleneck lies... my suspicions naturally go towards I/O, especially since some of these machines don't have all that much memory per-core.

That's true, I take that back.

Often parsing headers repetively is expensive and needs to "compile" it to pch to reduce cost.Other

**NobodyXu** · 04 February 2024, 07:07 PM

Originally posted by cj.wijtmans View Post

because servers have a lot of threads idling waiting for input(polling) that need to respond quickly with low latency, finish quickly for the next request. If you have few cores context switching all the time you will gain a greater inefficiency. It is more complicated than that as well but that is the most basics i can think of.

Yes many servers are over-sold, e.g. they put 200 or more VMs into a server with only 128 cores since most of the time they're idle.

My point is that there are indeed softwares/use cases that can scale that much, and that adding more cores to 24-core intel chips doesn't add much overhead, considering ampereOne managed to have 192 cores.

Announcement

Intel Thread Director Virtualization Patches Boost Some Workloads By ~14%

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment