Announcement

Collapse
No announcement yet.

Apple M1 Ultra With 20 CPU Cores, 64 Core GPU, 32 Core Neural Engine, Up To 128GB Memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by drakonas777 View Post

    My personal impressions is that people tend to believe that M1s are some magical chips lightyears ahead of x86. They are not. Reality is they use superior node, rely on special purpose accelerators a lot and have tight integration with Apple SW stack. Not to mention M1s are not meant to be stand-alone product, so Apple can use silicon surface space more generously than say Intel or AMD, who optimize to minimal surface AF, because of direct economy.
    Actually M1's are quite ahead of x86 but its not due to magic. You are missing whats probably making the biggest impact, the fact that M1 is a system on chip. The RAM on M1 is ridiculously fast and this is only possible (when taking into account budget) by having the memory and CPU combined. One of the main reasons why M1 is beating x86 chips in equivalent bechmarks such as compiling chrome (no accelerators used here) is that the memory/L1/L2/L3 cache is ridiculously fast.

    People need to stop spreading this myth that apples M1's are fast only because of their accelerators.

    Also in regards to ISA's, there are definitive advantages to ARM's design which apple abuses to squeeze out more performance. One obvious example is that each ARM instruction has the same width, this makes efficient pipelining very trivial to implement where as on x86 the instruction pipelining is actually very complex because the width of a future instruction is unknown. There are other things as well, ARM by design has very little legacy cruft which means more space on the chip to be used for other things. On the other hand x86 has instructions going back to the 80s which still need to be supported one way or another.
    Last edited by mdedetrich; 09 March 2022, 04:53 PM.

    Comment


    • #72
      Originally posted by Slartifartblast View Post
      All the credit goes to ARM for their core processor design and TSMC for their excellent 5nm process node. Lastly an honourable mention to Apple for adding a bit of tinsel on top.
      ARM did not design this core processor at all. It was entirely designed by Apple.

      Comment


      • #73
        Originally posted by mdedetrich View Post
        Actually M1's are quite ahead of x86 but its not due to magic. You are missing whats probably making the biggest impact, the fact that M1 is a system on chip. The RAM on M1 is ridiculously fast
        It's not only one thing. It's a more sophisticated microarchitecture that's partly enabled by a better ISA. It's made on a better process node. It has bigger L1 & L2 caches, possibly enabled by Apple's vertical integration. And it has in-package memory stacks. All of this works together to result in the performance & efficiency gap that you see.

        However, if you go back and look at their A13 and A14 SoC, using regular LPDDR4X, they had a distinct IPC and perf/W advantage over even the best x86 cores.


        That's right. A phone is literally holding its own against AMD's penultimate desktop CPU, made on virtually the same TSMC 7 nm manufacturing node, and with no fancy in-package DRAM! This is not clock-for-clock or perf/W, this is absolute (mostly single-thread) performance!

        Note: the graph is labelled Energy Efficiency, but the way it's showing that is by plotting energy usage on the left side vs. performance on the right side. The reason for the x86 CPUs not having any bar on the left side is because they'd be completely off the chart!

        Source: https://www.anandtech.com/show/14892...d-max-review/4
        Last edited by coder; 09 March 2022, 05:34 PM.

        Comment


        • #74
          Originally posted by tildearrow View Post
          May be an incredibly powerful processor, but too bad it's only on the least Linux-friendly machines ever.
          At least the Asahi Linux project has been making some great progress!

          Comment


          • #75
            Originally posted by ezst036 View Post

            I see, that makes logical sense. So if given enough time and man hours, knowing the commands a fully-functional reverse engineered GPU driver can ultimately be developed over the years.

            So in some years an M1 will have better open source support than Nvidia and in theory as good as AMD or Intel if the motivation is there to accomplish it.
            In the timespan it'll take Apple's chip to get good open source GPU drivers, the rest of the industry will have already produced a higher-performing chip with perfect open-source graphics support from day one. So there's not much point in waiting, and there's not much point using it in the meantime.

            Comment


            • #76
              Originally posted by ⲣⲂaggins View Post

              In the timespan it'll take Apple's chip to get good open source GPU drivers, the rest of the industry will have already produced a higher-performing chip with perfect open-source graphics support from day one. So there's not much point in waiting, and there's not much point using it in the meantime.
              Don't hold your breath. When a competition creates a similar performant HW (at the time Apple will be generations ahead), it's just the start for the opensource community to work on the opensource driver (no vendor will create opensource driver, or at least not soon after the release of the HW). On the other hand, the opensource GPU driver for M1 is shaping nicely already.

              Comment


              • #77
                Hopefully future hardware iterations will be similar enough to the first to make driver development time much shorter.

                Comment


                • #78
                  Originally posted by ⲣⲂaggins View Post
                  Hopefully future hardware iterations will be similar enough to the first to make driver development time much shorter.
                  I'm sure the open source Vulkan driver from Imagination will be of some use, as well. Though I haven't followed details of the M1's GPU, prior Apple SoCs seemed to carry forward a strong heritage from Imagination's designs, even though Apple is now doing in-house design of the GPU. Apple eventually agreed to retain an IP license from Imagination, although I think the speculation is that it mainly just gives them access to Imagination's patent portfolio.

                  I was disappointed not to see Apple simply buy Imagination, as I was worried it couldn't survive without them. However, we see it's now powering Chinese dGPUs and probably SoCs. They're also trying to be the RISC-V counterpart of ARM. Both being nominally UK-based, it should be an interesting rivalry.

                  Comment


                  • #79
                    Originally posted by coder View Post
                    If these numbers are remotely accurate, it's a much bigger gap than just due to manufacturing. It's the kind of efficiency gap you only get between a microarchitecture that was designed for efficient performance from day 1 vs. Intel's desperate scramble to retake the x86 performance crown.
                    I agree that microarchitecture is instrumental for good power efficiency, but lithography is directly related to uarch. I mean extra density allows to implement extra smart things in the chip. M1, for instance, contains 16 billion transistors, when chip like 5800X has 4.15CCD + 2.09IOD - thats about 2.5x more transistor budget to "play with" for M1.

                    Originally posted by coder View Post
                    We'll have to get some independent testing, but recent Apple cores have always delivered far better perf/W and perf/clock on the full spectrum of benchmarks, including generic ones like SPEC 2017. No, "special accelerators" or "tight integration" in sight.
                    I'd say it depends on the outlet and methodology used, but mostly on the workload. When I saw HU tests, for example, I was not impressed very much - M1 is better overall of course, but advantage varies a lot, sometimes efficiency is basically comparable to AL/ZEN3/+ :



                    Originally posted by coder View Post
                    This hasn't been true before and I see no reason to believe it will be true in future. You can go back and look at perf/W of Apple SoCs on TSMC 7 nm and it's still much better than AMD CPUs on the same node.
                    Comparing mobile SoCs with desktop CPUs is not exactly apples vs apples comparison. First, performance (as showed by AL vs ZEN3+) not necessarily scales linearly to the power, so you really can't assume how well those Ax's would perform say at 35/45/65/95W. Another thing is synthetic benchmarks. Personally I do not care how much of a "industry standard" they are - I never take them seriously. They are like comparing cars only by 0-60mph time. I look into specific use cases of the practical workloads only. Chips are too different these days to make some generalized conclusions IMHO. I guess let's wait and see.

                    Originally posted by coder View Post
                    x86 is at an intrinsic disadvantage. The decoder is inefficient and a bottleneck, plus having ~half the GP registers limits ILP and forces spills that wouldn't happen on ARM.
                    Sure, but that "intrinsic disadvantage" is overrated AF in IT community. I sincerely believe that x86 is nowhere near it's "hard limits" regarding perf scaling. Intel Nova Lake should demonstrate that around 2025

                    Originally posted by mdedetrich View Post

                    Actually M1's are quite ahead of x86 but its not due to magic. You are missing whats probably making the biggest impact, the fact that M1 is a system on chip. The RAM on M1 is ridiculously fast and this is only possible (when taking into account budget) by having the memory and CPU combined. One of the main reasons why M1 is beating x86 chips in equivalent bechmarks such as compiling chrome (no accelerators used here) is that the memory/L1/L2/L3 cache is ridiculously fast.

                    People need to stop spreading this myth that apples M1's are fast only because of their accelerators.
                    Yes, I've forgot to mention RAM, also unified memory/cache coherency. Never said that M1s are fast only because of accelerators though. I meant that they are very important in apple ecosystem and any benchmarking should be done carefully regarding possible offloads, especially testing 3d rendering, encoding etc.. In other words there is a big difference between comparing cores vs cores and SoC vs SoC.
                    Last edited by drakonas777; 10 March 2022, 03:24 PM.

                    Comment


                    • #80
                      Originally posted by drakonas777 View Post
                      I agree that microarchitecture is instrumental for good power efficiency, but lithography is directly related to uarch.
                      Again, even on the same process node, Apple's cores have always been wider than anyone else's. Consider my example of Zen 2 vs. Lightning, which were both made on virtually the same TSMC 7 nm process. Or, with a little work, you can compare vs. Zen3, which I think uses actually the same TSMC process.

                      Originally posted by drakonas777 View Post
                      I mean extra density allows to implement extra smart things in the chip. M1, for instance, contains 16 billion transistors,
                      Look at a floor plan of the M1 and see how little of it is devoted to the actual CPU cores. Sure, extra transistors let you make a wider chip, but process node is only part of the story.

                      Originally posted by drakonas777 View Post
                      when chip like 5800X has 4.15CCD + 2.09IOD - thats about 2.5x more transistor budget to "play with" for M1.
                      You'd do much better to at least compare against one of AMD's APUs, as the 5800X has not even a GPU.

                      But Apple burns transistors on other things, like the AMX unit, the ISP, and a dedicated neural compute unit. These can help certain applications, but they play no role in the benchmarks like I quoted above.

                      Originally posted by drakonas777 View Post
                      When I saw HU tests, for example, I was not impressed very much
                      No youtube, please. Link to a web page or I'm not looking at it.

                      Originally posted by drakonas777 View Post
                      Comparing mobile SoCs with desktop CPUs is not exactly apples vs apples comparison.
                      M1 Ultra is definitely not for mobile. They're only selling it as a desktop solution.

                      Originally posted by drakonas777 View Post
                      First, performance (as showed by AL vs ZEN3+) not necessarily scales linearly to the power, so you really can't assume how well those Ax's would perform say at 35/45/65/95W.
                      That's what graphs like this are supposed to capture:


                      Originally posted by drakonas777 View Post
                      Another thing is synthetic benchmarks. Personally I do not care how much of a "industry standard" they are - I never take them seriously.
                      They provide a point of comparison. And if you drill down into the individual SPEC bench tests, they're comprised of industry-standard apps! It's a long way from LAPACK benchmarks or Drystone/Whetstone MIPS from the days of yore.

                      The idea is that you pick the set of benchmark tests most relevant to your purpose and weight those accordingly.

                      Sometimes, a highly-synthetic benchmark exists not so much to inform you about your application performance, but to help shed insight into the architecture's strengths and weaknesses, possibly helping to explain other benchmark results. Some good examples of this are the CoreMark and the OpenMP-based Stream/Triad tests.

                      Originally posted by drakonas777 View Post
                      They are like comparing cars only by 0-60mph time.
                      Not really. That'd be true if there were only a single test, in benchmarks. SPEC2017 contains a fairly diverse set of different applications. So, it's like having a drag race + lap time for an oval + a road course + some rallycross + maybe a few laps on a frozen lake + an endurance race.

                      Originally posted by drakonas777 View Post
                      I sincerely believe that x86 is nowhere near it's "hard limits" regarding perf scaling. Intel Nova Lake should demonstrate that around 2025
                      I sincerely believe that Intel (and AMD) will sidestep the liabilities of x86 by building chips that are increasingly non-x86. I think the front-end will become increasingly decoupled from the actual work they do.

                      In the end, there's only so much lipstick you can put on a pig. And ISA matters less to customers now than ever before. I predict Intel and AMD will have both publicly started transitioning away from x86 by the end of the decade, if not well before. x86 lasted far longer than almost anyone predicted, but we're finally seeing the sun set on it.

                      Originally posted by drakonas777 View Post
                      I meant that they are very important in apple ecosystem and any benchmarking should be done carefully regarding possible offloads, especially testing 3d rendering, encoding etc.. In other words there is a big difference between comparing cores vs cores and SoC vs SoC.
                      Tests like SPECbench 2017 and nearly all of PTS don't use accelerators. They are built from the sources, which don't even have code for those accelerators.

                      Comment

                      Working...
                      X