Announcement

**TheDcoder** · 06 June 2022, 04:14 PM

I'd snatch up a MacBook Air if there was good Linux support for it. I hate Apple but this is genuinely exciting.

**Sergey Podobry** · 06 June 2022, 04:17 PM

M1 is fast because it has low memory latency (the RAM is very close to the CPU). And it uses less power because Apple buys out all capacity of 5nm node process, so it's not available to other vendors. The current AMD CPU is 7nm, Intel is 10nm. ARM or x86 doesn't play a big role here.

**schmidtbag** · 06 June 2022, 04:22 PM

Originally posted by tildearrow View Post

It's honestly sad to see how Apple has a monopoly on the low-power high-performance CPU market, because Apple is the king of lock-in considering how many people sought and worked hard to liberate their devices (e.g. "jailbreak") and had to reverse engineer the architecture to no end (e.g. Asahi Linux) just so that they could run a better operating system.
Ampere will never look at us (instead exclusively focusing on server) and so we will be stuck to either:
- low-cost, low-power systems with poor performance (e.g. Raspberry Pi)
- high-cost, high-performance systems but with terrible power efficiency (x86)

The high-cost, high-performance and low-power option is non-existent besides Apple...

As torsionbar28 pointed out, x86-64 can achieve similar levels of efficiency when you don't push the chips so hard. Wattage does not scale linearly with overall performance. There are a lot of factors to consider. Generally speaking, ARM is going to be more efficient because it's a significantly simpler design.
What he didn't mention is how critical optimization is. Apple gains a lot of performance-per-watt specifically because they lock down their platform. Everything is finely tuned and tightly integrated, which makes their design more efficient. They don't allow a bunch of BS 3rd party bloatware. Their compiler takes advantage of all the instructions of the CPU. They don't have to compensate for a bunch of 3rd party APIs which may add overhead. On a RISC architecture, these differences add up quite significantly.

What Intel has done with Clear Linux is a good example of how much more performance you can squeeze out of a CPU without disproportionately cranking up the wattage (whereas overclocking or adding more cores will).

**amxfonseca** · 06 June 2022, 04:26 PM

Originally posted by luno View Post

no AV1 support is big down, overall good incremental improvement , now they support running x86_64 linux binary in Rosetta https://nitter.net/never_released/st...090944258054#m

That is nice. Didn’t know about that one. I see that you programmatically create a virtiofs device that you mount on your Linux guest. That volume contains all the Rosetta binaries.

This will probably mean that the binaries will be impossible to legally redistribute due to licensing, so you cannot easily use it on the native Asahi Linux installation. I guess you can always copy it yourself, as an extra step. Or maybe the Asahi MacOS installer can do it for you.

Even though I currently run Arch Linux as a dev environment on a 14in MacBook Pro using QEMU, and I’ve yet to find something that does not support aarch64 natively. I may give Rosetta a try to see if it works though.

**kgardas** · 06 June 2022, 04:34 PM

Originally posted by ssokolow View Post

This post argues that's not the case and that the reason M1 manages to do so well is a fundamental limitation in x86 ISA chips' ability to parallelize instruction decode for a CISC architecture with variable width opcodes. (In addition to other things like "SoCs have certain advantages and there aren't any x86 SoCs in that market segment yet".)

No, this is a bad conclusion from otherwise quite fine article. Perhaps a bit misleading and you have fallen into its trap. There is an urban legend than x86 is limited to 4 isns decoders. It was this till, hmm, Elkhart Lake IIRC which may use 2 decoder both decoding up to 3 isns in parallel and then till AlderLake/P-core which can decode 6 isns in parallel.

So Intel is able to decode 6 CISC ISNs. Do you know what the CPU does with those decoded ISNs -- it caches them, it saves them as a *precious* material for reuse and it holds them like Glum his ring up to the last moment... So, I guess limitation of x64/decoder is highly overhyped here.

So why is M1/M2 that fast? IMHO! Due to:

- cache. -- their cache design is *fantastc*
- RAM integration -- fantastic choice for *common* case. Hmm, in comparison with my Xeon W with 256GB RAM, M1/2 is still just a toy right? -- but for *common* *consumer* workload, fantastic
- simple ISA -- but my bed here is that this is just 1-2% of a speed result. internally both SoCs are just pure load-store RISC machines.

And why are M1/M2 that power effecient? IMHO!

- whole package design is limited to efficient frequencies (~3GHz sweet spot)

- the most modern TSMC node tech which competitors do not have access to yet.

**Developer12** · 06 June 2022, 05:04 PM

Originally posted by Sergey Podobry View Post

PC has unified memory since buldozer AMD APUs.

Wrong sense of the word unified. Unless you package the ram on the board or on the CPU module as apple does, you will never hit this level of memory bandwidth. A 16-slot server is *barely* able to keep up with the M1 in this regard.

**Developer12** · 06 June 2022, 05:05 PM

Originally posted by schmidtbag View Post

As torsionbar28 pointed out, x86-64 can achieve similar levels of efficiency when you don't push the chips so hard. Wattage does not scale linearly with overall performance. There are a lot of factors to consider. Generally speaking, ARM is going to be more efficient because it's a significantly simpler design.
What he didn't mention is how critical optimization is. Apple gains a lot of performance-per-watt specifically because they lock down their platform. Everything is finely tuned and tightly integrated, which makes their design more efficient. They don't allow a bunch of BS 3rd party bloatware. Their compiler takes advantage of all the instructions of the CPU. They don't have to compensate for a bunch of 3rd party APIs which may add overhead. On a RISC architecture, these differences add up quite significantly.

What Intel has done with Clear Linux is a good example of how much more performance you can squeeze out of a CPU without disproportionately cranking up the wattage (whereas overclocking or adding more cores will).

your point about lock-in would almost be true, if it weren't that linux seems the same performance on the M1.

**Developer12** · 06 June 2022, 05:09 PM

Originally posted by kgardas View Post

No, this is a bad conclusion from otherwise quite fine article. Perhaps a bit misleading and you have fallen into its trap. There is an urban legend than x86 is limited to 4 isns decoders. It was this till, hmm, Elkhart Lake IIRC which may use 2 decoder both decoding up to 3 isns in parallel and then till AlderLake/P-core which can decode 6 isns in parallel.

So Intel is able to decode 6 CISC ISNs. Do you know what the CPU does with those decoded ISNs -- it caches them, it saves them as a *precious* material for reuse and it holds them like Glum his ring up to the last moment... So, I guess limitation of x64/decoder is highly overhyped here.

So why is M1/M2 that fast? IMHO! Due to:

- cache. -- their cache design is *fantastc*
- RAM integration -- fantastic choice for *common* case. Hmm, in comparison with my Xeon W with 256GB RAM, M1/2 is still just a toy right? -- but for *common* *consumer* workload, fantastic
- simple ISA -- but my bed here is that this is just 1-2% of a speed result. internally both SoCs are just pure load-store RISC machines.

And why are M1/M2 that power effecient? IMHO!

- whole package design is limited to efficient frequencies (~3GHz sweet spot)

- the most modern TSMC node tech which competitors do not have access to yet.

X86 chips still pay the price despite all the instruction caching they claim. There's no free lunch for having a bad ISA. That caching is of limited size, consumes massive amounts of die area in addition to the decoding circuitry, and the ISA still imposes a low limit on how quickly you can decode *new* code while following program execution. Since the dawn of pentium, x86 has always spent more than double the number of transistors to achieve the same performance.

**brucethemoose** · 06 June 2022, 05:34 PM

Originally posted by tildearrow View Post

It's honestly sad to see how Apple has a monopoly on the low-power high-performance CPU market, because Apple is the king of lock-in considering how many people sought and worked hard to liberate their devices (e.g. "jailbreak") and had to reverse engineer the architecture to no end (e.g. Asahi Linux) just so that they could run a better operating system.
Ampere will never look at us (instead exclusively focusing on server) and so we will be stuck to either:
- low-cost, low-power systems with poor performance (e.g. Raspberry Pi)
- high-cost, high-performance systems but with terrible power efficiency (x86)

The high-cost, high-performance and low-power option is non-existent besides Apple...

x86 is not hopeless. In fact, the CPU part of renoir and cezzane were amazing in lower power envelopes, even without little cores.

The problem, IMO, is the lack of die space dedicated to graphics, accelerators, cache, wider buses and so on that Apple clearly prioritizes.

AMD had Van Gogh for a long time, but OEMs (allegedly) completely rejected it until Valve came along.

**Sergey Podobry** · 06 June 2022, 05:48 PM

Originally posted by Developer12 View Post

Wrong sense of the word unified. Unless you package the ram on the board or on the CPU module as apple does, you will never hit this level of memory bandwidth. A 16-slot server is *barely* able to keep up with the M1 in this regard.

I don't think it's wrong. The memory is unified between CPU and GPU.

As for the bandwidth: an ordinary desktop Intel Adler Lake CPU with DDR5 has 100GB/s bandwidth that is more than M1 but less than M1 Pro and M1 Max. M1 Max has a theoretic bandwidth 400 GB/s but real tests show 240 GB/s. This is huge and comparable to modern servers with DDR4.

Announcement

Apple Announces Its New M2 Processor

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment