Originally posted by bezirg
View Post
Announcement
Collapse
No announcement yet.
Intel Thread Director Virtualization Patches Boost Some Workloads By ~14%
Collapse
X
-
Originally posted by pieman View Postintel realized this mistake and bringing it back with arrow lake. intel did realize the mismatching isa was a weakness.Last edited by drakonas777; 04 February 2024, 05:04 AM.
- Likes 1
Comment
-
Originally posted by Kjell View Post
Shine how?
E-Cores are a joke.
It's a cheap attempt of trying to catch up with the superior power efficiency of AMD.
Keep coping
Without them, the fabs wouldn't magically become better than tsmc's. Instead of loosing in power efficiency they would loose in power efficiency and multi-core performance.
Comment
-
Originally posted by ms178 View PostI'd like to see benchmarks with AMD's implementation of their compact cores but for some reason they still haven't released such a SKU yet.
Originally posted by ms178 View PostI want a great AVX-512 implementation and 8P + 12C cores that just work everywhere as intended. Neither company wants to give me that at the moment.
Comment
-
Originally posted by NobodyXu View PostThing is, adding CPU cores ktself does not add overhead, the overhead is merely from the software itself, the algorithm used.
Originally posted by NobodyXu View Poststuff like compilation is often linearly scalable.
Not sure where the bottleneck lies... my suspicions naturally go towards I/O, especially since some of these machines don't have all that much memory per-core.
Comment
-
Originally posted by RealNC View PostEverything I learned at school tells me that the more cores you have, the less the performance gains scale due to synchronization overhead. The more cores, the bigger the overhead. 8 cores with performance N are always faster than 16 cores with performance N/2.
That's not to say there aren't plenty of examples of near-linear scaling. It depends a lot on what you're doing. Graphics is the poster child for parallelism, which is how GPUs manage such impressive performance by distributing the work among tens of thousands of "threads".
Getting back to CPUs, have you not seen Michael's benchmarks of 2x 128-core AMD Begamo systems with 512 threads?
No, it doesn't scale linearly over 2x 96-core Genoa, but then it runs at lower clock speeds and has half the L3 cache. Considering that, I'd say the results are quite impressive. Especially, when you consider it's using only 86.4% as much power, I'd certainly be satisfied with a 19.9% performance improvement, rather than 33.3%.
So, maybe think about that, before you rush to complain that E-cores are getting too numerous to scale well. I'm not saying the do scale as well as they might... it does feel to me like the E-cores were somewhat hastily bolted on to Intel's current hybrid CPUs.
- Likes 2
Comment
-
Originally posted by drakonas777 View PostEven in code compilation the practical benefits of E cores are questionable,
If Raptor Lake's E-cores are nearly doubling its compilation performance at only about 50% more die area, I'd say it's a win. Wouldn't you?Last edited by coder; 04 February 2024, 06:09 PM.
- Likes 1
Comment
-
Originally posted by NobodyXu View Post
Funny because for server, they opt to use 64 cores with worse performance than desktop chip.
Thing is, adding CPU cores ktself does not add overhead, the overhead is merely from the software itself, the algorithm used.
It's true that for many algorithms there's a limit to the parallelism, hence why on desktop it's rare to see more than 32 cores, but stuff like compilation is often linearly scalable.
Comment
-
Originally posted by coder View PostCache coherency adds overhead, and that's in the hardware. Furthermore, scaling core counts means a larger interconnect topology, which should increase memory latency and cross-NUMA latencies.
Yet AmpereOne already releases a 192-core CPU for server, compared to that monstor, the overhead of adding more cores to Intel/AMD desktop cpus which has less than 32 cores is unlikely to be a problem.
Originally posted by coder View PostYou'd expect so, right? It's funny that I haven't seen Phoronix' compilation benchmarks scale very linearly. In fact, they have tended to scale much less linearly than certain other workloads.
Not sure where the bottleneck lies... my suspicions naturally go towards I/O, especially since some of these machines don't have all that much memory per-core.
Often parsing headers repetively is expensive and needs to "compile" it to pch to reduce cost.Other
Comment
-
Originally posted by cj.wijtmans View Post
because servers have a lot of threads idling waiting for input(polling) that need to respond quickly with low latency, finish quickly for the next request. If you have few cores context switching all the time you will gain a greater inefficiency. It is more complicated than that as well but that is the most basics i can think of.
My point is that there are indeed softwares/use cases that can scale that much, and that adding more cores to 24-core intel chips doesn't add much overhead, considering ampereOne managed to have 192 cores.
- Likes 1
Comment
Comment