Originally posted by Setif
View Post
Announcement
Collapse
No announcement yet.
100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742
Collapse
X
-
Originally posted by sykobee View PostTo be honest, it is impressive to see an ARM based product winning against high-end x86-64 competitors in 10% of benchmarks. Assuming that it is using a little less power, and is a little bit cheaper, the perf/W and perf/$ should be competitive or desirable.
Obviously Zen 3 coming soon is a big leap again on the Epyc side of things, but there will be a Graviton 3 next year as well, I would assume.
- Likes 2
Comment
-
Originally posted by vegabook View PostYet another set of benchmarks puts the lie to ARM being anything other than an architecture for the edge. These guys have had 25 years to show us competitive ARM server side and the fails keep coming. That both Qualcomm and AMD dumped their ARM server ambitions years ago (after working hard at it), and that Intel never even tried, really should have been a clarion call to anyone still dreaming. Amazon, with hundreds of billions of dollars of cash, in its second version, still can't get anywhere close. This is going to cement the general view that ARM just doesn't cut it for serious compute and probably never will.
As for AMD they never worked hard on ARM. It was more of a case of working with what they had to save the company back then. Beyond that a bunch of tests aimed at server work loads is not general compute. Instead of AMD, which never worked hard on ARM, consider Fujitsu which has worked real hard on their A64FX.
- Likes 2
Comment
-
Originally posted by PerformanceExpert View PostThat's not relevant. Micro architecture is about the CPU internals, not the number of cores.
- Likes 1
Comment
-
Originally posted by pal666 View Postyour post is not just irrelevant, it's false - scaling depends on microarchitecture. but in any case there's no microarchitecture uses by subj in smartphones
Comment
-
Originally posted by wizard69 View Post
Huh? did you even look at the benchmarks? This processor is only generation 2 so that is a highly impressive showing in my book.
As for AMD they never worked hard on ARM. It was more of a case of working with what they had to save the company back then. Beyond that a bunch of tests aimed at server work loads is not general compute. Instead of AMD, which never worked hard on ARM, consider Fujitsu which has worked real hard on their A64FX.
Comment
-
Originally posted by vegabook View Post
"huh?" indeed - did you look at the benchmarks? This thing comes first a pathetic 9% of the time and is 35% slower than the EPYC. What am I missing here? Don't get me wrong - I'd love ARM to get up there and I've run both a TX2 and a Jetson Nano which I love. But I'm waiting patiently for anything to get anywhere close to x86 on compute and you have to admit many have tried for a looong time now.
This beats the vast majority of x86 servers, including most of the EPYC range. It achieves this performance using half the power, a third of the silicon and 1/8th of the L3 cache of the 7742. All at a low frequency of 2.5GHz. That's mission accomplished.
- Likes 1
Comment
-
Originally posted by Setif View PostIt looks like Hyper-threading is useless for HPC.
If an single thread is primarily rate limited by a particular unit (this is generally either the AVX units or crypto units) then we can say two things
- adding a second thread (SMT) won't add any throughput AND
- the AMD throughput and the Graviton throughput primarily reflect that particular unit.
So
- if code is essentially AVX rate limited (ie a lot of the HPC FP code), then it's going to do a lot better on AMD (or Intel) than on Graviton2 because Graviton2 only has 2 (128-bit) NEON units.
- if code is essentially crypto-instruction limited then Graviton2's performance probably reflects some combination of
+ someone hasn't yet written the dedicated assembly to use those specific crypto instructions OR
+ Graviton2 can do fewer of the relevant crypto instructions per second (eg 1-wide vs 2-wide for AMD) OR
+ Graviton2 may not even have the relevant specific instructions yet. (ARM may have defined them, but in a later version of the ISA than the v8.2 that Graviton2 is using).
What to make of all this? Well it depends on your purpose and your time frame...
If you want to buy cloud time TODAY to run these types of apps, Graviton2 is probably a bad choice.
But if you want to understand what Graviton3 will look like, then the point that matters is that these are all (by the standards of HW design) easily fixable.
Presumably Graviton3 will be based on X1, so will have 4-wide NEON, and perhaps also wider crypto.
Writing the relevant crypto routines in assembly is easy enough, it's just more damn thing that someone has to get round to.
The only problem that won't likely be fixed by this time next year is the use of ARMv8.2 rather than something more recent. (This I won't defend. Ultimately ALL of ARM's cores are locked to the ISA of their small cores because even something like X1 for Servers [so no small cores] is ultimately based on A78 which has to pair with a small core. And for reasons that make zero sense to me, ARM doesn't seem to care about this, and so is ridiculously slow in updating its small cores...
cf Apple which is running on ARMv8.4 and, for all we know, AMX is actually implementing more-or-less the matrix instructions from ARMv8.6?)
The other aspect of this, if you want to talk horse race, is: is the race you care about ARM vs x86, or is it Intel vs rest of the world? Because ultimately what we see here is a lot of
- AMD comes first
- ARM does OK
- Intel has, uh, problems (cf the earlier Phoronix benchmarks).
Throw in that Intel can't meet demand, and one might expect that, whether it's AMD or ARM that's getting what used to be Intel business, ether way, once an organization has bothered to try an Intel alternative, inertia has been overcome, and it's a lot easier for them to stick with ARM or AMD next year and the year after that.
Comment
-
Originally posted by PerformanceExpert View Post
What you are missing is why would beating the 7742 on every single Phoronix benchmark be the one and only measure of success? Many of the benchmarks are pretty terrible, which is why most people stick with SPEC and real-world benchmarks.
This beats the vast majority of x86 servers, including most of the EPYC range. It achieves this performance using half the power, a third of the silicon and 1/8th of the L3 cache of the 7742. All at a low frequency of 2.5GHz. That's mission accomplished.
Comment
Comment