Apple M2 Support Added To Upstream LLVM Along With The A15, A16

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • coder
    Senior Member
    • Nov 2014
    • 8843

    #21
    Originally posted by CommunityMember View Post
    And likely some good reasons not to move. While the details of the license Apple has with ARM is under NDA, an architecture license (which is what Apple has been presumed to have), does not allow use of a different variant without payments of additional fees. There are no (currently named) companies with an architecture license for ARMv9 (not that we would necessarily know if there are any).
    Considering the size of ARM's revenues, these fees should be negligible for someone like Apple. Of all the reasons not to advance to ARMv9-A, this strikes me as the least likely.

    Comment

    • coder
      Senior Member
      • Nov 2014
      • 8843

      #22
      Originally posted by CommunityMember View Post
      While just adding a name is easy, there are also scheduling/tuning/feature values, and those are (primarily) known only to the chip designer.
      I'm guessing you didn't actually look at the patch before writing this, because it has no scheduling or tuning details.

      And even if Apple weren't the ones submitting this patch, presumably to be followed by tuning details, it shouldn't be too hard to devise a set of tests (if they don't already exist) to determine decent values experimentally.

      Originally posted by CommunityMember View Post
      In this case they had to use some specific settings to enable features (as the commit comment explicitly mentions).
      That was just because the ISA features didn't neatly align with ARMv8 feature levels. It would be trivial for anyone with access to these devices to write a little test program to actually test which instructions are supported. Of course, because it's Apple, they already knew.

      Comment

      • name99
        Senior Member
        • Mar 2013
        • 288

        #23
        Originally posted by coder View Post
        What I find shocking is that The A16 is still only ARMv8-A.

        I guess there's no reason Apple has to move to ARMv9-A until it's good & ready, but they're probably going to start missing out on some SVE2 optimizations, if they drag their feet too much longer. As much as Apple controls its own software ecosystem, app developers surely use a fair amount of open source libs that'll begin to gain SVE2 codepaths, as ARMv9 becomes the default for new Android devices and Windows-ARM laptops.

        In the server market, Graviton 3 is probably the last big server CPU not to use ARMv9, but even it has SVE1.
        Or maybe the plan is to provide an alternative to SVE…
        SVE is better than the hash Intel has made of AVX but it’s far from perfect in various ways.
        Look up the Macroscalar architecture…

        Comment

        • name99
          Senior Member
          • Mar 2013
          • 288

          #24
          Originally posted by coder View Post
          I'm guessing you didn't actually look at the patch before writing this, because it has no scheduling or tuning details.

          And even if Apple weren't the ones submitting this patch, presumably to be followed by tuning details, it shouldn't be too hard to devise a set of tests (if they don't already exist) to determine decent values experimentally.


          That was just because the ISA features didn't neatly align with ARMv8 feature levels. It would be trivial for anyone with access to these devices to write a little test program to actually test which instructions are supported. Of course, because it's Apple, they already knew.
          Actually it does. Look at the feature list, eg the fuse options. You can track these through LLVM to see the exact pattern that are fused.
          Note that sometimes the fuse options are added a generation before they appear in cores (presumably so that when the new core is introduced, pre-existing code is already close to optimal). So the logical-arithmetic fusions do not appear to be present in A14/M1, but may well be in A15/M2.
          The new fusions provided are basically common sense, as you’d expect. The main ones missing still are div+rem, and wide mul hi+lo.

          Comment

          • coder
            Senior Member
            • Nov 2014
            • 8843

            #25
            Originally posted by name99 View Post
            Or maybe the plan is to provide an alternative to SVE…
            SVE is better than the hash Intel has made of AVX but it’s far from perfect in various ways.
            Look up the Macroscalar architecture…
            Thanks for the tip, and I will check it out, but my point still stands about them missing out on SVE-optimized software. So, I think they'll eventually need to add it.

            Comment

            • coder
              Senior Member
              • Nov 2014
              • 8843

              #26
              Originally posted by name99 View Post
              Actually it does. Look at the feature list, eg the fuse options. You can track these through LLVM to see the exact pattern that are fused.
              Okay, thanks for pointing that out. What I meant was the scheduling model. I was expecting to see a custom scheduler model for the new cores, but I now see that Apple is always just using Cyclone. I'm also noticing they didn't bother to tune the prefetch parameters since A7.

              Do you think they maintain a different scheduler model, on their internal fork? I guess a way to find out would be to compile the same code with the same version of public LLVM that Apple's tools seem sync'd with.

              Comment

              • name99
                Senior Member
                • Mar 2013
                • 288

                #27
                Originally posted by coder View Post
                Okay, thanks for pointing that out. What I meant was the scheduling model. I was expecting to see a custom scheduler model for the new cores, but I now see that Apple is always just using Cyclone. I'm also noticing they didn't bother to tune the prefetch parameters since A7.

                Do you think they maintain a different scheduler model, on their internal fork? I guess a way to find out would be to compile the same code with the same version of public LLVM that Apple's tools seem sync'd with.
                You don’t need a scheduling model when you’re as OoO as Apple, you really don’t! All you need is hints to ensure that fused pairs are always placed adjacent in the instruction stream.

                Comment

                • name99
                  Senior Member
                  • Mar 2013
                  • 288

                  #28
                  Originally posted by coder View Post
                  Thanks for the tip, and I will check it out, but my point still stands about them missing out on SVE-optimized software. So, I think they'll eventually need to add it.
                  And ARM is missing out on AMX optimized software. These things happen and life goes on.
                  Apple’s bet is that little specifically SVE optimized code will be written (as opposed to auto-vectorized code). They are probably correct.
                  It’s no longer the 1990s, not even the 2010s.

                  Losing 3x from not having a SIMD ISA is a big deal. Losing 10% by having autovectorization go down one path rather than another is no big deal.

                  Comment

                  • coder
                    Senior Member
                    • Nov 2014
                    • 8843

                    #29
                    Originally posted by name99 View Post
                    You don’t need a scheduling model when you’re as OoO as Apple, you really don’t!
                    I considered that, but I still think things like the ratio of different execution ports can have a measurable effect. Maybe in just a few compute-heavy corner cases, but I'm not convinced it's irrelevant.

                    Originally posted by name99 View Post
                    All you need is hints to ensure that fused pairs are always placed adjacent in the instruction stream.
                    Interesting. I'd have expected their OoO would handle that, too. I guess, if you can just patch the compiler, then why bother doing it in hardware?

                    Comment

                    • coder
                      Senior Member
                      • Nov 2014
                      • 8843

                      #30
                      Originally posted by name99 View Post
                      Losing 3x from not having a SIMD ISA is a big deal. Losing 10% by having autovectorization go down one path rather than another is no big deal.
                      Not sure where you got the 10% figure, but it's not consistent with what ARM reported (as I quoted in comment 13).

                      In any case, I just think it's interesting. I don't have a dog in this fight -- just a bemused observer.

                      Comment

                      Working...
                      X