Announcement

**PerformanceExpert** · 16 December 2020, 01:51 PM

Originally posted by tuxd3v View Post

The reality is that ARM64 exists because AMD helped ARM to create ARM64, in the time when AMD was thinking ingoing ARM..

Do you have any evidence to back up that claim? AMD released an Arm server based on Arm's Cortex-A57 but there is as far as they went.

**PerformanceExpert** · 16 December 2020, 02:17 PM

Originally posted by AmericanLocomotive View Post

Don't you think Ampere would be bragging a teeny bit more about their performance/watt and total performance advantage if every benchmark reflected those two?

https://www.servethehome.com/wp-content/uploads/2020/03/Ampere-Altra-watts-per-core.jpg

https://www.servethehome.com/wp-content/uploads/2020/03/Ampere-Altra-Rack-Density.jpg

Zen 3 will close that "performance per rack" metric easily, and the efficiency gains will bring it essentially exactly in line.

Fugaku isn't anywhere even close to the most efficient super computer. Not even by a long shot. It's marginally more efficient than Summit, a PowerPC and nvidia Tesla accelerated system that came out 2 years earlier.

The current most efficient super computing systems are almost all AMD + nvidia systems.

We're discussing CPU power here, not per-rack which includes a lot of other stuff. Like I said, I'm waiting for AnandTech to publish more authoritative perf/Watt results for SPEC. The soon to be released Altra Max will be better too, so let's compare that with Zen 3.

Fugaku was #1 in Green 500 when it came out. Since then the new A100 accelerator has beaten it in efficiency (no surprise) but it's still at #6 and #10 in GREEN 500 and #1 in TOP 500.

**PerformanceExpert** · 16 December 2020, 02:24 PM

Originally posted by tuxd3v View Post

That is a true reality, AMD64 distros ship with unoptimized software to take advantage of amd64 arch..
Meanwhile ARM64, is compiled for ARM64 taking advantage of ARM64 features..

No that's total rubbish. In both cases distros target the base architecture which is ARMv8.0-A for AArch64, so none of the many extensions are enabled.

This is simply the way all software works (and has always worked). You cannot ship binaries that rely on the latest features since they don't work on every CPU (you can add runtime checks of course but that is only worth it in specific cases).

**markg85** · 16 December 2020, 02:40 PM

Originally posted by Weasel View Post

A simple ISA is like caring for saving 1 MB of space when you have 1 TB of total disk space.

You are plain and simple totally wrong.

While the ISA, in strict terms, is a "layer" that talks between the compiler and the hardware. It darn well determines what the hardware is capable of. A complex ISA will therefore inevitably have the hardware to make that possible which therefore will be more complex too. I don't know if you've ever seen a CPU image of interconnects. That's mindbogglingly complex! For your idea, this article (https://www.eetimes.com/whats-the-to...-silicon-chip/) from 2007 based on 30nm chips (we're at ~5nm these days) had about, and i quote, "1.76 meters of interconnect for every square millimeter on the chip". That today is VASTLY more!

Your 1TB ISA would be a stupendous more.
So in other terms, a simple ISA doesn't only make everything simpler. It makes it possible to design the CPU itself much more efficient (literally less wires on the nano scale).
And this is why ARM, with a relatively simple ISA compared to x86, is now close to totally beating the crap out of AMD and Intel in every metric while consuming literally a fraction of the power. RISC-V is even more extreme in that regard as it has an even simpler ISA.

**AmericanLocomotive** · 16 December 2020, 03:01 PM

Originally posted by PerformanceExpert View Post

We're discussing CPU power here, not per-rack which includes a lot of other stuff. Like I said, I'm waiting for AnandTech to publish more authoritative perf/Watt results for SPEC. The soon to be released Altra Max will be better too, so let's compare that with Zen 3.

"Wittich said the Ampere chip is 14% better than AMD’s fastest Epyc chip on power efficiency and 4% faster on raw performance." - That's from Ampere's Senior's VP of products, and that matches up with those slides.

Fugaku was #1 in Green 500 when it came out. Since then the new A100 accelerator has beaten it in efficiency (no surprise) but it's still at #6 and #10 in GREEN 500 and #1 in TOP 500.

Fugaku was never the #1 in the Green 500. That system you linked is not Fugaku, but a much smaller and lower clocked system using the same processors.

**Space Heater** · 16 December 2020, 03:15 PM

Originally posted by PerformanceExpert View Post

It's insane to claim ISA doesn't matter.

Not being the primary driver of performance and efficiency improvements is not the same as "doesn't matter".

**edwaleni** · 16 December 2020, 04:08 PM

Did ARM actually advance or did Intel stop moving forward? If Intel had maintained their performance curve and not fallen into this multi-year rut they are in, would AMD and Neoverse have caught up in the first place?

Is this ARM CPU the result of incredible design or simply the next step in its evolution?

Did Intel's strategic stumbles and pricing indifference allow new money to enter the R&D space to create these new evolutions?

It will be interesting to see what Intel produces in 2021 and 2022 in response to these market advances.

**coder** · 16 December 2020, 04:12 PM

Originally posted by Michael View Post

For CPU reviews do normally do -march=native but at least for mcpu/mtune neoverse-n1 on Ubuntu 20.10's GCC10 been seeing differing results in some cases actually performing worse than generic aarch64 code meanwhile on other compilers seeing better results. So that is why the defaults / no override was used for this testing ...

So, you're saying that 100% of these tests used the default compiler flags for the particular test case?

And if that's so, I don't understand why some of them use -march=native, while others that would clearly benefit greatly from it (e.g. the TNN tests) don't. So, how do you explain that inconsistency?

Also, please give us some insight into your methodology for test selection.

Thanks for the review and addressing these concerns.

**coder** · 16 December 2020, 04:18 PM

Originally posted by PerformanceExpert View Post

Neoverse-N1 was launched back in 2019 too,

Except that this CPU will benefit from tweaks to TSMC's 7 nm node, like those we saw with the Ryzen 3000XT refresh. That could help explain why it clocks significantly higher than Amazon's Graviton2 (which uses the same cores, as I'm sure you'll know).

Originally posted by PerformanceExpert View Post

So it doesn't look like EPYC will ever regain the performance crown again.

...so quick to declare a new champion, yet this CPU only won in about 60% of the tests included. And then there's the issue of the somewhat odd test selection and the use of compiler options that clearly don't let the x86 CPUs stretch their legs, in some important test cases. I would expect a performance expert to be more detail-oriented.

Originally posted by PerformanceExpert View Post

At this point it is not only obvious Arm scales higher than x86, but that x86 has no chance of keeping up. From now on the fastest servers in the world are Arm based. You can see the industry shifting big-time to Arm (most recently Twitter moved to Graviton 2).

Amazon's charges less for their Graviton2 instances, which is probably the reason Twitter switched. For a cloud service, performance just has to be good enough, and then the main concern switches to costs.

What's weird about this claim that "Arm scales higher than x86" is that it's so filled with caveats. For one thing, Altra has separate NUMA domains, which AMD specifically rejected, in their 7002 generation. So, if you really need a lot of cores because you have a heavy workload that requires lots of communication, then symmetric is probably still the way to go. However, if you're just making a cloud/density play, and plan to partition up with lots of VMs or containers that each fit in a single NUMA domain, then this NUMA approach is better.

Again, more nuances that you're just completely blowing past in your rush to declare a new champion.

**coder** · 16 December 2020, 04:34 PM

Originally posted by PerformanceExpert View Post

It's insane to claim ISA doesn't matter. ...

At some point you have got to ask yourself: are Intel/AMD CPU designers totally incompetent, or does ISA actually matter?

FWIW, we agree on this point. The power-efficiency of ARM cores is a strong testimony to this.

Many forget (or are unaware) that Intel tried and failed to compete with ARM on their home turf, when they made a play for the phone/tablet market. Intel's offerings were simply noncompetitive on perf/W, regardless of pricing.

Announcement

Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment