Announcement

Collapse
No announcement yet.

Apple M1 Ultra With 20 CPU Cores, 64 Core GPU, 32 Core Neural Engine, Up To 128GB Memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • piotrj3
    replied
    Originally posted by coder View Post
    I think you've got it backwards. Intel bins its chips so that the highest-spec SKUs have the lowest leakage of the dies they make. So, if you took a lower-spec model, the corresponding perf/W graph would probably look worse over its entire range.


    Are you saying Intel would win vs. the M1 Ultra, or you're just saying that Intel CPU beats the AMD one?

    Anyway, the main thing you seem to be missing is how their graph shows significantly better efficiency, even at the top of their performance envelope, which also happens to be far above the top of Intel's performance envelope. Granted, it's comparing a 20-core/20-thread CPU to a 16-core/24-thread one.


    It all depends on what benchmarks you use, and whether the M1 is running them in Rosetta. Got a link to these benchmarks?


    Good question. However, Apple has an IP license for Imagination Tech's IP. Imagination demonstrated HW-accelerated raytracing several years before Nvidia launched RTX. Granted, I'm sure they were much less capable, but they were also running in mobile power envelopes.


    That 32-core Neural unit is probably good for something.


    Well, there's Metal and MoltenVK (i.e. Vulkan on top of Metal), for graphics.

    Good question about GPU-compute, though. I do wonder how many do GPU-compute on macs, regardless.


    Yeah, but Apple is still yet to launch their new Mac Pro. So, we don't yet have their ultimate solution.

    Anyway, I think Apple isn't really competing head-to-head with Intel, AMD, or Nvidia. Most people buying Macs wouldn't even consider anything else, as long as Apple had a product that could meet their needs. And most people not buying Macs wouldn't buy one because it's too expensive for what you get. So, if they don't need a Mac, they're not going to buy one.

    Basically, Apple just needs to keep people in its garden ...and keep making them feel special, for being there.
    No, you got it backwards. About power leakage this is not how it works at all. Der8auer describes it very well. He bought high binned CPU (for very high cost) and effect is that this CPU comparing to his previous 12900k consumes 30W more at literally same frequency, voltage etc. And high power leakage is actually very good for overclocking and reaching extreme clocks.

    https://youtu.be/v1mgN9e6fFs

    Look at 15:30. I bet for AMD with their PBO strategies like best core etc. is even more suspect to that topic. I don't know exact reason for that but i trust there Der8auer and i suspect it is because high leakage chip must have passed quality checks despite that leakage what means even with high leakage it can know what is 1 and what is 0 what could imply they are not afraid of high voltage/frequency.

    Also i found here interesting article : https://books.google.pl/books?id=cM8...ocking&f=false

    Leave a comment:


  • drakonas777
    replied
    Regarding efficiency you should keep in mind, that in 16C maximum performance case for example, they were using 12900K at PL2, which is really low efficiency bar to cross in this particular use case. Don't take all Apple benchmarks for a face value, most of their "Xs" and "percents" are quite inflated. Wait for independent benchmarks, especially real life workload based ones.

    Also, TSMC N5 is about one and a half node ahead of Intel 7, so for HPC ARM based core using this node not being able to deliver better efficiency than existing x86 would be an absolute failure. So of course M1 ARM cores are more efficient - this is expected. The question is how much. Answer is as always - depends on the workload and testing methodology, but in some instances not that much in comparison to AL.

    My personal impressions is that people tend to believe that M1s are some magical chips lightyears ahead of x86. They are not. Reality is they use superior node, rely on special purpose accelerators a lot and have tight integration with Apple SW stack. Not to mention M1s are not meant to be stand-alone product, so Apple can use silicon surface space more generously than say Intel or AMD, who optimize to minimal surface AF, because of direct economy.

    ARM ISA is not inherently vastly superior to x86 - you can google Jim Keller's talks on this matter. Upcoming TSMC N5 / INTEL 5 x86 cores will be at the very similar efficiency level, but at that time Apple will most likely have N3 SoCs so we'll have the same story.

    Leave a comment:


  • sykobee
    replied
    Originally posted by piotrj3 View Post
    2nd. For Intel they picked overclockable CPU that is designed to be overclocked and to make good overclockable chip you need relativly high wattage leakage. This is why 12900k is less energy efficient then top of the line AMD cpus, but if you compare more mild designs of Intel like 12400 (non-k) vs AMD 5600, Intel wins in efficiency easly. It is very funny to read this graph when you know Intel recently posted similar type of graph for Laptop CPUs (which are not designed to be overclocked) and compared it succesfully to Apple's laptop M1s.
    Intel's ADL laptop chips are pretty poor (relatively, not absolutely) in terms of efficiency at low TDPs.

    AMD's 6000 series APUs for mobile are more power efficient, especially under 30W. Intel only gets more efficient at higher TDPs, simply because they can clock their cores higher with that power.

    I can easily believe that Apple's chips on 5nm are even more power efficient than AMD's 6nm (which is basically 7nm++) APUs.

    Still, it's M1 for 2 years it appears, which might be expected for a major architecture switch, but also doesn't match Apple's usual 1 year cadence on CPU/SoC designs.

    Leave a comment:


  • andre30correia
    replied
    still we need to use their garbage OS with all the spywire and slow crap, no thanks

    Leave a comment:


  • DOSftw
    replied
    Regarding efficiency you should keep in mind, that in 16C maximum performance case for example, they were using 12900K at PL2, which is really low efficiency bar to cross in this particular use case. Don't take all Apple benchmarks for a face value, most of their "Xs" and "percents" are quite inflated. Wait for independent benchmarks, and real life workload based ones.

    Also, TSMC N5 is about one and a half node ahead of Intel 7, so for ARM based core using this node not being able to deliver better efficiency than existing x86 would be an absolute failure. So of course that M1 ARM cores are more efficient - this is expected. The question is how much. Answer is as always - depends on the workload and testing methodology.

    My personal impressions is that people tend to believe that M1s are some magical chips lightyears ahead of x86. They are not. They use superior node, a lot of special purpose accelerators and have tight integration with Apple SW stack. ARM ISA is not inherently vastly superior to x86 - you can google Jim Keller talks on this matter. Reality is that upcoming TSMC N5 / INTEL 5 x86 cores will be at the very similar efficiency level, but at that time Apple will most likely have N3 SoCs so here we go again.

    Leave a comment:


  • coder
    replied
    Originally posted by Slithery View Post
    What on earth gives you that idea? It's obviously going to be ProRes...
    Sorry, I'm not familiar with the Apple jargon. I didn't know ProRes was a codec.

    If you look at Nvidia's specs, it's not too big of a stretch to think it could be H.264. The 3-year-old Tesla T4 (75W RTX 2070-equivalent) can decode 4k H.264 @ 300 fps (aggregate).

    Leave a comment:


  • Slithery
    replied
    Originally posted by coder View Post
    • fancy media engines - they claim the Ultra can decode 18 streams of 8K video (unsure of the codec, so probably H.264)
    What on earth gives you that idea? It's obviously going to be ProRes...

    Leave a comment:


  • coder
    replied
    Originally posted by name99 View Post
    Apple ROB is over 2000 entries. What is 630 entries is the size of the History File.
    The numbers you have quoted are so far from the values that actually matter that it’s laughable.
    Corrections welcome! I know you're the authority on this stuff.

    Originally posted by name99 View Post
    you insists on listing only the “x86-equivalent” specs!
    I was trying to make a point with the data readily available. And that point was meant specifically to try and compare the level of sophistication of the microarchitectures, since I was replying to someone ascribing the M1's superiority merely to its cache/memory subsystem. I think the point was made, but I'm open to anything you'd like to add.

    If you ever felt like writing a blog post that crystalized the key points about how Apple's Firestorm compared with Golden Cove or Zen 3, I'd gladly cite it. I know about your 350-page book, but few people are going to read that much (myself included ).
    Last edited by coder; 09 March 2022, 03:29 AM.

    Leave a comment:


  • name99
    replied
    Originally posted by coder View Post
    Okay, let's look at Apple's Firestorm core (A14, M1) that launched more than a year before Intel's Golden Cove (Alder Lake):

    Parameter Apple Firestorm Intel Golden Cove
    Decoder Ports* 8 6
    Reorder Buffer 630 512
    Max uOps Dispatched/cycle 17? 12

    * x86 decoders tend to be restricted. For instance, Sunny Cove had 1 complex + 4 simple decoders. GC's decode block certainly has similar restrictions.

    Sources:
    Now, according to Anandtech, Alder Lake has higher single- & multi- thread performance than the M1 Max (although SPECfp2017 MT is basically equal!). However, it does so at considerably higher power consumption. So, Apple's wider & more-sophisticated microarchitecture manages to deliver much better perf/W, and by a margin you can't reasonably attribute to the delta between TSMC 5nm vs. Intel 7.
    Apple ROB is over 2000 entries. What is 630 entries is the size of the History File.
    The numbers you have quoted are so far from the values that actually matter that it’s laughable. Of course a machine will not look good if it’s designed in a way completely different to an x86, but you insists on listing only the “x86-equivalent” specs!

    Leave a comment:


  • name99
    replied
    Originally posted by piotrj3 View Post
    I think Apple went very high on marketing here. RTX3090 has 940GB/s memory bandwidth only for itself (RAM is seperate thing for GPU) and without NUMA like topology between few chips and RTX3090 is actually handicapped often by memory speed even in games (and productivity way more).
    Win some. Lose some.
    Yes the GPU gets (very slightly) less bandwidth. But it has access to 128GB, rather than the max of 48GB on a Quattro.
    Professionals tell me they care about that vastly more than about a slightly lower bandwidth…

    And, how often does this need to be stated, this is still not end of the line! This is Apple operating at the iMac Pro level (the machine it replaces), not at the Mac Pro level. That will come with CXL and a whole new vision of how to modularize and design a professional machine.

    Leave a comment:

Working...
X