Announcement

Collapse
No announcement yet.

Apple M4 Support Added To The LLVM Compiler, Confirming Its ISA Capabilities

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apple M4 Support Added To The LLVM Compiler, Confirming Its ISA Capabilities

    Phoronix: Apple M4 Support Added To The LLVM Compiler, Confirming Its ISA Capabilities

    Apple compiler engineers have contributed Apple M4 CPU support to the upstream LLVM/Clang compiler via the new -mcpu=apple-m4 target. Interestingly the Apple M4 is exposed as an ARMv8.7 derived design...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    It's a bit surprising that Apple didn't include include support for SIMD instructions such as these. There would be plenty of Mac-specific workloads that could benefit from such instructions, right?

    Maybe the ARM architecture license that Apple owns does not cover the SVE instructions, and they don't want to shell out the licensing fees for those?

    Or maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?

    Comment


    • #3
      Originally posted by SteamPunker View Post
      Or maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?
      Probably this. That and dedicated accelerators for video encode/decode, etc as well. I would expect it eventually. They have their metal api which I recall will use any of the cpu, gpu and npu as needed.

      Comment


      • #4
        Does anyone know what the correct march / mcpu options are for the Snapdragon 8 Gen2? Can't find the details anywhere

        Comment


        • #5
          Originally posted by SteamPunker View Post
          It's a bit surprising that Apple didn't include include support for SIMD instructions such as these. There would be plenty of Mac-specific workloads that could benefit from such instructions, right?

          Maybe the ARM architecture license that Apple owns does not cover the SVE instructions, and they don't want to shell out the licensing fees for those?

          Or maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?
          AI workloads will use SME2 and 512bit SSVE, NEON is enough for legacy workloads because big.LITTLE require consistent SVE2 vector length, in fact only 128bit is reasonable, which is same as NEON.

          Comment


          • #6
            Originally posted by FireBurn View Post
            Does anyone know what the correct march / mcpu options are for the Snapdragon 8 Gen2? Can't find the details anywhere
            Just cortex-x3

            Comment


            • #7
              Originally posted by SteamPunker View Post
              It's a bit surprising that Apple didn't include include support for SIMD instructions such as these. There would be plenty of Mac-specific workloads that could benefit from such instructions, right?
              Agreed.

              Originally posted by SteamPunker View Post
              Maybe the ARM architecture license that Apple owns does not cover the SVE instructions, and they don't want to shell out the licensing fees for those?
              I heard some speculation around Apple not upgrading to ARMv9-A, due to things like licensing costs or terms, but I think ARM is pushing for SVE2 support to be more ubiquitous and it would therefore surprise me if they created any disincentive for architecture licensees to implement them. In fact, I thought SVE2 was now a mandatory part of ARMv9-A.

              Originally posted by SteamPunker View Post
              Or maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?
              I wonder if Apple might still be hurting from the loss of talent it suffered when Nuvia was formed, 5 years ago. That could help explain the stagnation in their CPU cores between M1 and M3. Even in the M4, perhaps there was just too much for ARMv9-A compliance to be done for them to take on SVE2, as well.

              Originally posted by SteamPunker View Post
              ​maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?
              Depends. For heavy-weight stuff, sure. However, there tend to be opportunities to do small loop vectorizations, string processing optimizations, and other things with vector instructions.
              Last edited by coder; 15 June 2024, 12:17 PM.

              Comment


              • #8
                Originally posted by edxposed View Post
                big.LITTLE require consistent SVE2 vector length, in fact only 128bit is reasonable, which is same as NEON.
                You can implement SVE2 at 128-bit, if you want. This is what the V2 cores in both Amazon's Graviton 4 and Nvidia's Grace do.

                Arm made a forceful case that SVE2 has better IPC efficiency than Neon. I think they went into more detail, in the introduction of SVE, but here's what I could find for SVE2:


                Edit: here's a technical presentation they made on SVE2 and the Transactional Memory Extensions (TME) present in ARMv9-A:
                Last edited by coder; 15 June 2024, 12:28 PM.

                Comment


                • #9
                  There were NO rumors of SVE/2 support. This was tested pretty much the first day hardware was available.

                  There IS SSVE support but the performance is abysmal. (For obvious reasons, if you understand how the AMX/SME hardware works vs how ARM defined SSVE.)

                  The politically interesting point is that SSVE support is not listed in the LLVM list of functionality...
                  My guess would be this is part of an ongoing negotiation/dispute between Apple and ARM. Apple was willing to change a few details of how they handle vectors on the AMX unit (eg instruction encoding) but are not willing to accept the specific details (where results are placed) that make SSVE so non-performant. They could presumably have used chicken bits to make this functionality invisible. But maybe they wanted a public demonstration, a message to ARM along the lines of "we can implement your instruction set, but performance is crappy, just like we told you it would be. Now can we PLEASE be adults and you make the changes we suggested?"

                  You can now engage in the usual Phoronix anti-Apple rants if you like, but I'd point out that QC is probably negotiating the exact same thing for the exact same tech reasons. QC is also refusing to implement SVE, and may well be holding their SME/SSVE implementation back until they can get ARM to see some sense.
                  There's a lot to like in SVE but there are also some truly crazy decisions (like the now-walked-back "any multiple of 128B" nonsense). So there's clearly a problem with the ISA design people not communicating well with the hardware people; very unlike the v8 design which was such a perfect matching of hardware and software optimal choices :-(

                  Comment


                  • #10
                    Originally posted by SteamPunker View Post
                    It's a bit surprising that Apple didn't include include support for SIMD instructions such as these. There would be plenty of Mac-specific workloads that could benefit from such instructions, right?

                    Maybe the ARM architecture license that Apple owns does not cover the SVE instructions, and they don't want to shell out the licensing fees for those?

                    Or maybe Apple believes that the GPUs integrated in their Apple Silicon SoCs would handle all the vector stuff better anyway?
                    You don't need to speculate on any of this. We know the full technical story; it's described in the AMX section of my M1 PDFs: https://github.com/name99-org/AArch6...3%20SoC.nb.pdf, along with the addendum to volume 1 that covers all changes from M1 to M4 - https://github.com/name99-org/AArch6...plainer.nb.pdf

                    I certainly don't have time to explain everything here. But you can speculate and listen to other equally uniformed speculation – or you can go read something that actually tells you ALL the tech details.

                    Comment

                    Working...
                    X