Originally posted by CommunityMember
View Post
Apple M2 Support Added To Upstream LLVM Along With The A15, A16
Collapse
X
-
Originally posted by CommunityMember View PostWhile just adding a name is easy, there are also scheduling/tuning/feature values, and those are (primarily) known only to the chip designer.
And even if Apple weren't the ones submitting this patch, presumably to be followed by tuning details, it shouldn't be too hard to devise a set of tests (if they don't already exist) to determine decent values experimentally.
Originally posted by CommunityMember View PostIn this case they had to use some specific settings to enable features (as the commit comment explicitly mentions).
Comment
-
-
Originally posted by coder View PostWhat I find shocking is that The A16 is still only ARMv8-A.
I guess there's no reason Apple has to move to ARMv9-A until it's good & ready, but they're probably going to start missing out on some SVE2 optimizations, if they drag their feet too much longer. As much as Apple controls its own software ecosystem, app developers surely use a fair amount of open source libs that'll begin to gain SVE2 codepaths, as ARMv9 becomes the default for new Android devices and Windows-ARM laptops.
In the server market, Graviton 3 is probably the last big server CPU not to use ARMv9, but even it has SVE1.
SVE is better than the hash Intel has made of AVX but it’s far from perfect in various ways.
Look up the Macroscalar architecture…
Comment
-
-
Originally posted by coder View PostI'm guessing you didn't actually look at the patch before writing this, because it has no scheduling or tuning details.
And even if Apple weren't the ones submitting this patch, presumably to be followed by tuning details, it shouldn't be too hard to devise a set of tests (if they don't already exist) to determine decent values experimentally.
That was just because the ISA features didn't neatly align with ARMv8 feature levels. It would be trivial for anyone with access to these devices to write a little test program to actually test which instructions are supported. Of course, because it's Apple, they already knew.
Note that sometimes the fuse options are added a generation before they appear in cores (presumably so that when the new core is introduced, pre-existing code is already close to optimal). So the logical-arithmetic fusions do not appear to be present in A14/M1, but may well be in A15/M2.
The new fusions provided are basically common sense, as you’d expect. The main ones missing still are div+rem, and wide mul hi+lo.
Comment
-
-
Originally posted by name99 View PostOr maybe the plan is to provide an alternative to SVE…
SVE is better than the hash Intel has made of AVX but it’s far from perfect in various ways.
Look up the Macroscalar architecture…
Comment
-
-
Originally posted by name99 View PostActually it does. Look at the feature list, eg the fuse options. You can track these through LLVM to see the exact pattern that are fused.
Do you think they maintain a different scheduler model, on their internal fork? I guess a way to find out would be to compile the same code with the same version of public LLVM that Apple's tools seem sync'd with.
Comment
-
-
Originally posted by coder View PostOkay, thanks for pointing that out. What I meant was the scheduling model. I was expecting to see a custom scheduler model for the new cores, but I now see that Apple is always just using Cyclone. I'm also noticing they didn't bother to tune the prefetch parameters since A7.
Do you think they maintain a different scheduler model, on their internal fork? I guess a way to find out would be to compile the same code with the same version of public LLVM that Apple's tools seem sync'd with.
Comment
-
-
Originally posted by coder View PostThanks for the tip, and I will check it out, but my point still stands about them missing out on SVE-optimized software. So, I think they'll eventually need to add it.
Apple’s bet is that little specifically SVE optimized code will be written (as opposed to auto-vectorized code). They are probably correct.
It’s no longer the 1990s, not even the 2010s.
Losing 3x from not having a SIMD ISA is a big deal. Losing 10% by having autovectorization go down one path rather than another is no big deal.
Comment
-
-
Originally posted by name99 View PostYou don’t need a scheduling model when you’re as OoO as Apple, you really don’t!
Originally posted by name99 View PostAll you need is hints to ensure that fused pairs are always placed adjacent in the instruction stream.
Comment
-
-
Originally posted by name99 View PostLosing 3x from not having a SIMD ISA is a big deal. Losing 10% by having autovectorization go down one path rather than another is no big deal.
In any case, I just think it's interesting. I don't have a dog in this fight -- just a bemused observer.
Comment
-
Comment