Announcement

**Jumbotron** · 20 November 2020, 06:27 PM

And what AMD pioneered and abandoned with HSA, Apple is now the leader.

Unified memory architecture

UMA stands for "unified memory architecture." When potential users look at M1 benchmarks and wonder how it's possible that a mobile-derived, relatively low-power chip is capable of that kind of performance, Apple points to UMA as a key ingredient for that success.

Federighi claimed that "modern computational or graphics rendering pipelines" have evolved, and they've become a "hybrid" of GPU compute, GPU rendering, image signal processing, and more.

UMA essentially means that all the components—a central processor (CPU), a graphics processor (GPU), a neural processor (NPU), an image signal processor (ISP), and so on—share one pool of very fast memory, positioned very close to all of them. This is counter to a common desktop paradigm, of say, dedicating one pool of memory to the CPU and another to the GPU on the other side of the board.

When users run demanding, multifaceted applications, the traditional pipelines may end up losing a lot of time and efficiency moving or copying data around so it can be accessed by all those different processors. Federighi suggested Apple's success with the M1 is partially due to rejecting this inefficient paradigm at both the hardware and software level:

We not only got the great advantage of just the raw performance of our GPU, but just as important was the fact that with the unified memory architecture, we weren't moving data constantly back and forth and changing formats that slowed it down. And we got a huge increase in performance.

And so I think workloads in the past where it's like, come up with the triangles you want to draw, ship them off to the discrete GPU and let it do its thing and never look back—that’s not what a modern computer rendering pipeline looks like today. These things are moving back and forth between many different execution units to accomplish these effects.

That's not the only optimization. For a few years now, Apple's Metal graphics API has employed "tile-based deferred rendering," which the M1's GPU is designed to take full advantage of. Federighi explained:

Where old-school GPUs would basically operate on the entire frame at once, we operate on tiles that we can move into extremely fast on-chip memory, and then perform a huge sequence of operations with all the different execution units on that tile. It's incredibly bandwidth-efficient in a way that these discrete GPUs are not. And then you just combine that with the massive width of our pipeline to RAM and the other efficiencies of the chip, and it’s a better architecture.

https://arstechnica.com/gadgets/2020...on-revolution/

**Grinness** · 20 November 2020, 06:32 PM

Originally posted by ultimA View Post

The results on that page you linked are very mixed. In some (many) cases the Intel CPUs have the advantage, on other some (many) cases the M1. In both cases, perf differences range from small to great. Which again is very impressive for the M1 given its much better heat and power characteristics.

As for the concept of RAM in same package as CPU, it is not actually new. It has a long history on embedded and SBCs. It is only new in the desktop(/laptop) segment.

Sure mix, but clearly a different story, where even under python or Java sci bench M1 is stronger than 2 to 5 years old CPU, new ones outperform M1 -- new ones means of the same 'year' but actually an 'older' lithography
M1 is surely still less power hungry, but ryzen 4500u is 15W TDP ...
(6 cores 6 threads, M1 is 8 cores)

https://www.amd.com/en/products/apu/amd-ryzen-5-4500u

Yes, RAM in same package is not a new concept, and I would like it to stay old.

**starshipeleven** · 20 November 2020, 06:43 PM

Originally posted by edwaleni View Post

starshipeleven should be landing any moment now.

I've been on vacation for a while now, and you let the forum be overrun by people like Jumbotron that is barely capable of copy-pasting Apple press releases?

I'm not coming back for a while still, deal with it

**wizard69** · 20 November 2020, 06:44 PM

Originally posted by Michael View Post

Have SoC package power monitoring working with PTS now so future articles will include raw power and perf-per-Watt.

This is good news! I literally just went out and purchased one of the MBA's due of the fact that I've been wanting a decent ARM based laptop for years now. It didn't have to be an Apple but since Apple is the first to deliver a competitive machine they get my money.

What is amazing is that there is no fan in this machine yet it runs software better than nay laptop I've owned to date. People can say whatever about Apple but this laptop sips power like no machine that I've ever used before. The only relay problem is that many apps are nto native yet, still they run fine. So that simply means I will get upgraded performance every couple of weeks for some time.

Now is it perfect. Nope won't go that far but show me an OS / hardware combo that is. I think this is what people are missing, M1 and the mahcines it dirves may not be perfect but they are at the top of the heap as far as users are concerned.

By the way looking forward to more native test as it can be done.

**Volta** · 20 November 2020, 06:47 PM

Originally posted by PerformanceExpert View Post

All modern CPUs decode their instructions into internal micro-ops. RISC vs CISC is about the ISA, not about the internals.

But what really matters is the incredible single-threaded performance of M1 - since it beats the fastest desktop x86 cores (even Zen 3!), it can still beat most cores while running translated code.

Where do you funny guys come from? From the same place where you have such fairy tales benchmarks?

**Volta** · 20 November 2020, 06:49 PM

Originally posted by JackLilhammers View Post

Yes, and then it came back to the normal 1-2%, because it was just a hiccup. Linux adoption has been quite stable (and stagnant) over the last few years

Apple also seems to be stagnant. Linux had huge chance, but Ubuntu messed few things up. However, it still has a chance with ongoing unification (flatpaks mainly).

**starshipeleven** · 20 November 2020, 07:01 PM

Originally posted by Jumbotron View Post

And what AMD pioneered and abandoned with HSA, Apple is now the leader.

For chrissake all embedded devices and APUs share the same RAM between their processors and accelerators. Smartphones from 5 years ago do that. AMD APUs do that, Intel CPUs with graphics do that.

HSA is an inter-operatibility standard and software middleware designed to be used to run computing tasks on hardware of different vendors, which is completely different from this that is a SINGLE system created by a SINGLE vendor that controls the full stack from the OS down to the hardware design.

Apple's Metal graphics API has employed "tile-based deferred rendering," which the M1's GPU is designed to take full advantage of.

Wow, it's using a feature DirectX 11.1 had in 2018 and is a core feature in both DirectX 12 and Vulkan, and calling it a new and revolutionary thing! Much excitement! Very Apple! Wow!

Yay, apple has made an embedded device that is using all embedded device technologies everyone else uses! Much innovation! Very Excitement!

**BillBroadley** · 20 November 2020, 07:37 PM

Originally posted by tildearrow View Post

That video encoding performance is so terrible, and just proves ARM still has some way to go... :<

Doubtfully, if Apple locks down the machine.

Not really. Just means that the x86-64 port of the video encode has some special ASM code to make it efficient, and the arm version does not. With native M1 code it does quite well. People have benchmarked exporting 4k HDR from the apple tools (premier) and it did quite well.

You have to compare apple to apple if you want to be fair.

**BillBroadley** · 20 November 2020, 07:42 PM

Originally posted by microcode View Post

WTF is going on with zstd. If it's that fast is it even correct? Do they have special hardware to accelerate LZ-style compressors?

I'm guessing here. But the M1 does have an unusually large L1 cache. My bet is the ZSTD has some heavily used table that fits in the L1 cache of the M1 and does not fit in the cache for the other CPUs. The result is an anomalously high result for the m1 that's not representative of most codes.

**rhavenn** · 20 November 2020, 07:55 PM

Originally posted by blacknova View Post

People complain about ability to run anything they want on hardware they purchase. Who would've thought...

You CAN run any software on the hardware you bought. Apple doesn't care. However, they're not going to go write Linux drivers for you. So, get to it if that's what you want.

Announcement

Apple M1 ARM Performance With A 2020 Mac Mini

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment