Announcement

**name99** · 16 April 2021, 02:27 PM

Originally posted by tildearrow View Post

I know, but doesn't ARM always lag behind x86 in IPC?
Apple just did some sort of black magic to achieve x86 IPC on ARM.

x86 IPC? Gimme a break. Try twice x86 IPC...
And it's not black magic. Almost every aspect of what Apple does is known and understood (at least by some of us). The issue is that Apple was willing to implement this stuff.
Some of the basic techniques include
- splitting what used to be a single object into multiple, each optimally sized, objects. Apple doesn't have a single "ROB", it has one object that's somewhat like a ROB, but which points into multiple other objects. One of these is the History File (used to rewind register mappings in the event of misspeculation recovery), another tracks branches, another tracks taken branches (both of these not yet fully understood)
- doing as much work as possible as EARLY as possible in the pipeline. Apple "execute" simple branches (unconditional, no link register involvement) at DECODE time. They execute register moves and initialization at RENAME time. They even execute some loads at RENAME time. Intel started going down this path with the Stack Engine, which is vaguely the same idea -- but then they apparently lost interest and never took the idea further. Meanwhile Apple have changed the tech details of how they implement these zero-cycle moves at least three times. (The original 2012 version was kinda lame, but did the job. The 2014 version is what I would have imagined as the best way to do it. The 2019 version -- damn that's clever!)
- doing as much as possible as LATE as possible. Apple has multiple branch predictors (sure, everyone does) but they are willing to go much further down the In-Order pipeline to flush misfetched instructions. Of course if you can catch every misfetch while it's still IN-ORDER (and InOrder extends all the way to Rename...) then you don't have to pay the cost of a flush, just a fetch resteer! Being willing to do this means your most sophisticated predictors can be generating a prediction as late as five cycles later than the fastest predictors and still have value.
- split a single task into multiple subtasks, and do as many of them as possible as early as possible.
Maybe ISA semantics means I can't issue this load until some early serializing instruction has cleared. OK -- but I can still issue a prefetch to that instruction so that when I DO execute the load it's lower latency...
Similar idea with the handling of special registers.

Yes, Apple has large caches, wide pipelines, etc. These are all options open to Intel. But they're options that have not been exercised.

Why hasn't Intel gone down this path? Most of this technology was known fifteen years ago. Hell much of it is based on research that INTEL sponsored...
Well, I have my theories.
But I think the main reason is -- look at the fury and denial whenever Apple's performance is brought up. If most of your customer base cares more about the brand name on the box than about actual performance, if they will go out of their way to avoid learning about alternatives (eg by shouting down any voices that try to explain what those alternatives are and how they work), then why bother working hard to make things better? The customer base has already told you they will buy whatever you ship -- and won't buy any alternative. Customers get the Intel they deserve. And if customers prioritize the Intel brand (or the x86 brand), and naive GHz, over actual performance over EVERYTHING else, they will get companies that take advantage of that fact.

**name99** · 16 April 2021, 02:29 PM

Originally posted by Neuro-Chef View Post

And a sane, standardized way of booting that crap, maybe..

Honestly, ARM performance is not the problem. Lack of driver support for end user devices and the need for device-specific OS images and/or ridiculous "flashing" processes is where it sucks against x86. So hard that even Windows 10 gives you more freedom of choice than Android, even LineageOS..

You do realize that we're living in 2021, not 2015?
SBSA is a real thing these days...

Server Base System Architecture - Wikipedia

https://en.wikipedia.org/wiki/Server_Base_System_Architecture

**coder** · 17 April 2021, 05:36 AM

Thanks for the summary. I was too busy, when M1 launched, and hadn't gone back to finish reading the analysis.

Originally posted by name99 View Post

Why hasn't Intel gone down this path? Most of this technology was known fifteen years ago. Hell much of it is based on research that INTEL sponsored...
Well, I have my theories.
But I think the main reason is -- look at the fury and denial whenever Apple's performance is brought up. If most of your customer base

How do you completely miss the fact that Apple is also on 5 nm and doesn't have to waste as much silicon on the decoder? Both of these free up additional transistors to implement some of the tricks you mention!

The other thing I think explains the x86 boys' relative simplicity is that they can lean more on SMT to hide latencies and generate extra parallelism. Many of those techniques you mentioned not only use more transistors, but also more energy.

But, the main difference between Intel/AMD and Apple is the former are both mostly focused on the server market, which benefits more from higher core-counts (at least, so far), than from higher single-thread performance. And building even larger, more complex cores means fewer per die. So, as long as x86 can keep scaling multi-thread performance, adding complexity to the pipeline is something they're probably more reluctant to do.

Something about the arrogance of reaching for "stupidity or thug tactics", as the first explanation for things not aligning with your vision, just strikes me as intellectually lazy. Try challenging yourself, a little more, to think of possible explanations for what you see. Sometimes, it's not that the players are dumb or bad at working towards their objectives, but rather that their objectives aren't quite what you assume.

**phuclv** · 17 April 2021, 09:26 AM

Originally posted by name99 View Post

You do realize that we're living in 2021, not 2015?
SBSA is a real thing these days...
https://en.wikipedia.org/wiki/Server...m_Architecture

I wasn't aware of this but which common ARM boards have this and can boot from arbitrary Linux images like x86?

**coder** · 17 April 2021, 03:19 PM

Originally posted by phuclv View Post

I wasn't aware of this but which common ARM boards have this and can boot from arbitrary Linux images like x86?

name99 elaborated more on it, previously. However, this post has caveats not mentioned above:

The New ARM Hardware Support That's Now Part Of The Linux 4.21 Kernel - Phoronix Forums

https://www.phoronix.com/forums/forum/hardware/processors-memory/1070182-the-new-arm-hardware-support-that-s-now-part-of-the-linux-4-21-kernel?p=1070216#post1070216

Phoronix: The New ARM Hardware Support That's Now Part Of The Linux 4.21 Kernel The ARM platform and board changes were sent in on New Year's Eve for the Linux 4.21 kernel... http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.21-ARM-Changes

Announcement

Arm Announces ARMv9 Architecture With SVE2

Comment

Comment

Comment

Comment

Comment