Announcement

**wizard69** · 24 May 2020, 04:16 AM

Originally posted by Setif View Post

It looks like Hyper-threading is useless for HPC.

Hyper threading has never served all users well. In fact it is often turned off in HPC data centers, it has been that way for a very long time.

**wizard69** · 24 May 2020, 04:22 AM

Originally posted by sykobee View Post

To be honest, it is impressive to see an ARM based product winning against high-end x86-64 competitors in 10% of benchmarks. Assuming that it is using a little less power, and is a little bit cheaper, the perf/W and perf/$ should be competitive or desirable.

I was surprised at how well it has done in these test. It is only a gen 2 device so I really have to hand it to the designers. I does seem strange that they didn't do better in some of the database tests, considering how the processor is used.

Obviously Zen 3 coming soon is a big leap again on the Epyc side of things, but there will be a Graviton 3 next year as well, I would assume.

What gets me going here is that it puts a lie to the idea that ARM can't compete. I'm still lusting for an ARM based laptop that doesn't suck. I'd even buy an Apple if they kept it as open as current laptops. With Apples experience in ARM design I'd expect really good performance for my use case and battery life to lust after.

**wizard69** · 24 May 2020, 04:30 AM

Originally posted by vegabook View Post

Yet another set of benchmarks puts the lie to ARM being anything other than an architecture for the edge. These guys have had 25 years to show us competitive ARM server side and the fails keep coming. That both Qualcomm and AMD dumped their ARM server ambitions years ago (after working hard at it), and that Intel never even tried, really should have been a clarion call to anyone still dreaming. Amazon, with hundreds of billions of dollars of cash, in its second version, still can't get anywhere close. This is going to cement the general view that ARM just doesn't cut it for serious compute and probably never will.

Huh? did you even look at the benchmarks? This processor is only generation 2 so that is a highly impressive showing in my book.

As for AMD they never worked hard on ARM. It was more of a case of working with what they had to save the company back then. Beyond that a bunch of tests aimed at server work loads is not general compute. Instead of AMD, which never worked hard on ARM, consider Fujitsu which has worked real hard on their A64FX.

**wizard69** · 24 May 2020, 04:41 AM

This thread has gone downhill real fast. If anybody really thinks that this is a poor showing they should consider how an intel chip would fare against Gaviton2. Mind you a single package 64 core chip.

**pal666** · 25 May 2020, 05:19 AM

Originally posted by PerformanceExpert View Post

That's not relevant. Micro architecture is about the CPU internals, not the number of cores.

your post is not just irrelevant, it's false - scaling depends on microarchitecture. but in any case there's no microarchitecture uses by subj in smartphones

**PerformanceExpert** · 25 May 2020, 07:00 AM

Originally posted by pal666 View Post

your post is not just irrelevant, it's false - scaling depends on microarchitecture. but in any case there's no microarchitecture uses by subj in smartphones

Many things affect scaling, including microarchitecture. But that doesn't stop you from using the same microarchitecture in mobiles and servers in order to share R&D costs like Arm did. As I said, the number of cores in a mobile is not relevant at all for this to be feasible.

**vegabook** · 26 May 2020, 05:18 AM

Originally posted by wizard69 View Post

Huh? did you even look at the benchmarks? This processor is only generation 2 so that is a highly impressive showing in my book.

As for AMD they never worked hard on ARM. It was more of a case of working with what they had to save the company back then. Beyond that a bunch of tests aimed at server work loads is not general compute. Instead of AMD, which never worked hard on ARM, consider Fujitsu which has worked real hard on their A64FX.

"huh?" indeed - did you look at the benchmarks? This thing comes first a pathetic 9% of the time and is 35% slower than the EPYC. What am I missing here? Don't get me wrong - I'd love ARM to get up there and I've run both a TX2 and a Jetson Nano which I love. But I'm waiting patiently for anything to get anywhere close to x86 on compute and you have to admit many have tried for a looong time now.

**PerformanceExpert** · 26 May 2020, 07:36 AM

Originally posted by vegabook View Post

"huh?" indeed - did you look at the benchmarks? This thing comes first a pathetic 9% of the time and is 35% slower than the EPYC. What am I missing here? Don't get me wrong - I'd love ARM to get up there and I've run both a TX2 and a Jetson Nano which I love. But I'm waiting patiently for anything to get anywhere close to x86 on compute and you have to admit many have tried for a looong time now.

What you are missing is why would beating the 7742 on every single Phoronix benchmark be the one and only measure of success? Many of the benchmarks are pretty terrible, which is why most people stick with SPEC and real-world benchmarks.

This beats the vast majority of x86 servers, including most of the EPYC range. It achieves this performance using half the power, a third of the silicon and 1/8th of the L3 cache of the 7742. All at a low frequency of 2.5GHz. That's mission accomplished.

**name99** · 30 May 2020, 09:45 PM

Originally posted by Setif View Post

It looks like Hyper-threading is useless for HPC.

You can say more than that.
If an single thread is primarily rate limited by a particular unit (this is generally either the AVX units or crypto units) then we can say two things
- adding a second thread (SMT) won't add any throughput AND
- the AMD throughput and the Graviton throughput primarily reflect that particular unit.

So
- if code is essentially AVX rate limited (ie a lot of the HPC FP code), then it's going to do a lot better on AMD (or Intel) than on Graviton2 because Graviton2 only has 2 (128-bit) NEON units.
- if code is essentially crypto-instruction limited then Graviton2's performance probably reflects some combination of
+ someone hasn't yet written the dedicated assembly to use those specific crypto instructions OR
+ Graviton2 can do fewer of the relevant crypto instructions per second (eg 1-wide vs 2-wide for AMD) OR
+ Graviton2 may not even have the relevant specific instructions yet. (ARM may have defined them, but in a later version of the ISA than the v8.2 that Graviton2 is using).

What to make of all this? Well it depends on your purpose and your time frame...
If you want to buy cloud time TODAY to run these types of apps, Graviton2 is probably a bad choice.
But if you want to understand what Graviton3 will look like, then the point that matters is that these are all (by the standards of HW design) easily fixable.
Presumably Graviton3 will be based on X1, so will have 4-wide NEON, and perhaps also wider crypto.
Writing the relevant crypto routines in assembly is easy enough, it's just more damn thing that someone has to get round to.
The only problem that won't likely be fixed by this time next year is the use of ARMv8.2 rather than something more recent. (This I won't defend. Ultimately ALL of ARM's cores are locked to the ISA of their small cores because even something like X1 for Servers [so no small cores] is ultimately based on A78 which has to pair with a small core. And for reasons that make zero sense to me, ARM doesn't seem to care about this, and so is ridiculously slow in updating its small cores...
cf Apple which is running on ARMv8.4 and, for all we know, AMX is actually implementing more-or-less the matrix instructions from ARMv8.6?)

The other aspect of this, if you want to talk horse race, is: is the race you care about ARM vs x86, or is it Intel vs rest of the world? Because ultimately what we see here is a lot of
- AMD comes first
- ARM does OK
- Intel has, uh, problems (cf the earlier Phoronix benchmarks).
Throw in that Intel can't meet demand, and one might expect that, whether it's AMD or ARM that's getting what used to be Intel business, ether way, once an organization has bothered to try an Intel alternative, inertia has been overcome, and it's a lot easier for them to stick with ARM or AMD next year and the year after that.

**vegabook** · 02 June 2020, 08:42 PM

Originally posted by PerformanceExpert View Post

What you are missing is why would beating the 7742 on every single Phoronix benchmark be the one and only measure of success? Many of the benchmarks are pretty terrible, which is why most people stick with SPEC and real-world benchmarks.

This beats the vast majority of x86 servers, including most of the EPYC range. It achieves this performance using half the power, a third of the silicon and 1/8th of the L3 cache of the 7742. All at a low frequency of 2.5GHz. That's mission accomplished.

Easy to say without providing any evidence whatsoever. Let's actually see all these fantastic benchmarks that you're crowing about.

Announcement

100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment