Apple M1 Ultra With 20 CPU Cores, 64 Core GPU, 32 Core Neural Engine, Up To 128GB Memory

drakonas777 replied

17 March 2022, 03:27 AM
Perceived "horrible power efficiency" of x86 is partially caused by idiotic factory settings for CPU/platform. I've put my 3700X into 45W ECO mode, and it lost like ~15% performance while consuming ~2x less power. You should remember that Intel/AMD strive for maximum performance per silicon square millimetre, especially in consumer market for mere silicon economy and THE BENCHMARKS, of course.
Likes 1
Leave a comment:
BillBroadley replied

16 March 2022, 07:31 PM
Originally posted by jacob View Post

For those who want an open system (not making a judgment here, simply looking at the PoV of those for whom openness is a more important criterion than compatibility, price or convenience), does this have any potential advantage over, say, a Threadripper? Of course Threadripper is far from being exactly open, but it's still less egregious than Apple in that regard.

*warning long rant*

Well PCs have a pretty awkward design and are constrained in a bunch of small ways for legacy reasons. Some examples:
GPUs need big fans/fancy heat sinks because in most cases they are mostly enclosed on 5 sides (of 6) without any significant engineered airflow from the case. It would make much more sense to put the hot bits of a GPU on the opposite side of a PCI-e card where the power supply and case exhaust fan could help. Generally the traditional desktop ATX case stinks. Air isn't forced where it needs to be, the worst cooling (in front of a GPU) is the where the most heat is. GPU sizes are ridiculous (because they can't count on cooling) and are a serious design limitation on anything smaller than a dorm fridge.

GPUs are also connected on a PCI-e bus that's high latency, low bandwidth, and not memory coherent. So even something like sending 1MB to a GPU becomes kind of painful, and is done manually, unlike say sending/receiving data from main memory where it's automatic. For similar reasons running out of GPU memory is a killer, not a slow decrease in performance.

PCs are often backward compatible not just to 32 bit, but 16 bit, and even 8. Yes, largely you could run DOS on a new PC. Backwards compatibility with previous ISAs increases complexity, which decreases security. It also increases the cost for testing, and makes every future iteration harder to design.

The x86-64 ISA is complex, various length, and difficult to decode (even ignoring 8/16/32 bit flavors). This adds overhead to decoding, reduces the number of instructions that can be decoded in parallel, and makes it harder (more expensive in transistors and heat) to do wide designs with more instructions per cycle. Alder Lake is similar per core to the M1 core (shared across all designs) but uses much more heat. There is a follow on with similarly aggressive design to the M1 (deep buffers, wider decode, wider issue, etc), but it's expected to be even more power hungry.

Full size PCs and SFFs often use CPU sockets and Dimm sockets that have significant cost in size, power, voltages, latency, and trigger big price increases for extra memory channels. Look at the cost and power used for Ryzen, Threadripper, and Pro/Epyc even when keeping the core counts as close as you can.

One dirty secret in the x86-64 world is poor scaling:
X86-64s run hot, it's many common scenarios they throttle because of heat or lack of power (voltage drops). Doubly so in anything resembling a laptop. My 16" MBP i9 runs the fans loudly at the drop of the hat, even just backups or patching

AMD/Intel will let a single core run at a higher clock, but reduce the clock speed as 2, 4, and then all cores get busy. So you rarely will get anywhere near 8x the performance with 8 cores vs 1

The higher core count chips run with lower clock to begin with

Some Intel AVX512 chips actually lower the clock below the base when heavily using the vector unit, even with aggressive water cooling.

Desktops (ryzen 7 or i7) have 2 memory channels, so only 2 cores can have a cache miss before the system is busy. All other cores have to wait for the first 2 to finish. Thread ripper increases this to 4, and Epyc or Threadripper pro increases this to 8, but also add more cores to they are needed. The lowest end M1 has the same number of channels of the highest end server chips from AMD and Intel. Generally it's pretty easy for memory bandwidth or memory channels to become a bottleneck. So often extra cores aren't contributing much performance.

x86-64's use a strong memory model and generally get less usable/observed memory bandwidth out of the same memory system an arm (with a weak memory model) will.

So apple M1 was cool, efficient, but most didn't fully appreciate it because the overall result wasn't a x86-64 killer, just a decent chip with impressively low power use. However:
It competitive with Intel and AMD laptop chips, but at much less power, some designs don't even have fans

It's a surprisingly aggressive design, decodes more instructions at once, has a larger re-order buffer, and can issue more instructions per cycle than any X86-64 design

Has a respectable GPU compared to AMD and Intel integrated GPUs. Not killing the competition, but competitive

It's somewhat perplexing to some that apple manage such an aggressive design (which is usually more power intensive), yet still miserly on the power.

Generally neutral compared to the best AMD/Intel. The low power AMD/Intel laptops were almost as fast, and the more power intensive laptops were clearly faster.

Does not implement any 32 bit or older instruction sets.

So that sets the stage for the M1 pro and M1 max:
M1 pro = 2x the memory bandwidth/GPU, M1 max = 4x the memory bandwidth/GPU

Both still easily within the thermal envelope of thin/light 14" and 16" laptops

Delivery 2x and 4x the GPU performance (at least for some GPU loads, including some games)

Crazy that you can now get 16 dimms (like on a dual socket Epyc or Xeon) of bandwidth in a this laptop with impressive battery life.

32 or more memory channels allow even aggressive GPU loads with tons of random memory accesses for multiple resolutions of textures to perform well. Radically better than any other iGPU.

None of the ugly limits of 6-16GB of vram common on other laptops in the same segment

Delivers better scaling with more CPU and iGPU cores than anything else by delivering more channels and memory bandwidth than any laptop.... or even desktops. More than a thread ripper pro even.

Generally not thermally limited, unlike competing solutions. Often can run well on battery and compete well with other laptops on wall power. Even heavy workloads do not trigger terrible battery life.

So the M1 Ultra studio is in a class of it's own:
Under 4 liters, similar to the smallest of PC cases, ones that often can't even fit a GPU and have a MUCH slower iGPU.

Similar in size/power to mini-itx systems with less cores, and 1/12th the memory bandwidth

Doubles the cores and GPU from the already impressive M1 max

Has similar bandwidth to a 24 dimm DDR5 solution that doesn't exist, but might in a late 2022 dual socket epyc (based on the unreleased Zen4) system that burns at minimum 600 watts and costs $10k or more and 14 liters or more.

Looks to have exceptional airflow from the bottom, past the M1 Ultra (CPU and GPU) and exits the rear, I'm expecting it to be near silent under even the heaviest loads. The fans/heat sink are a substantial fraction of the small case.

Can drive four 5k monitors and a HDMI TV, which would often require more than one GPU.

Generally has unknown overall performance, but looks promising. Geekbench 5 shows similar performance to 24-52 core Intel and AMD systems (Xeon, Threadripper, and Epyc), not bad for a relatively low power 20 core Apple. Granted Geekbench 5 isn't a great benchmark, but it's what we have for now.

With all of the above said. Will the apple studio run popular games better than a top of the line windows PC ... unlikely. Will the most exciting new games include a native port to M1 ... unlikely. Will gamers flock to the system ... unlikely.

However I do think that over the next 5 years apple might well become more gamer friendly. 3D performance of the iphone, ipad, and desktops is coming along nicely. Apple could easily ship a substantially more powerful Apple TV for gaming. Bummer that Apple's doubling down on Metal, which no other gaming platform supports. At least if the linux port works standard 3D APIs would work.

Do you spend hours editing multiple streams of high res video and actually make a few $k a month producing video? If so, I'd seriously consider the Apple Studio.

Do you love/eat/breath Linux on a daily basis? If so, I'd watch the Asahi linux port or even better help with the port or contribute to their Patreon.

I'd love to justify one myself, I do need a new desktop, but I'm not giving up Linux. Even with a nice Linux port I think I'd keep waiting for a Apple Mini with the rumored M1 pro or M1 max upgrade. Presumably without battery, keyboard, and screen it would be significantly cheaper than the 14" or 16" MBPs.

Personally I do have occasional use for encode/decode/transcode 4k UHD streams, but I'm not CPU limited on a 2015 4c/8t desktop. I do plan to build a home security system to identify objects in multiple camera streams. But even a Raspberry Pi with an accelerator like the Coral.ai doesn't do a bad job. Sure I might upgrade to a faster CPU or GPU, but can't justify a $5k studio. It would be exciting to need one though, it's a amazingly cool machine with some unique capabilities unmatched anywhere.
Leave a comment:
tildearrow replied

16 March 2022, 04:41 PM
Originally posted by BillBroadley View Post

That doesn't seem fair. It doesn't have a locked bootloader, uses a standard instruction set (AARCH64), has a linux port, and doesn't require any hacking/exploit to install your own OS on it. Sure Apple doesn't support Linux, but they aren't hindering it either. Marcan does have a Linux port (asahi Linux) that generally works on the Mini M1 (booting, network, SMP support, etc). It's "usable", but the big missing piece it the GPU acceleration, which is being worked on. But sure if you want to help join in or pitch a few $ a month on patreon to help pay for the work, I am.

But doesn't have a standard boot method and uses several proprietary interfaces.

The work done on Asahi Linux is impressive considering how closed the platform is.
Likes 1
Leave a comment:
BillBroadley replied

16 March 2022, 04:24 PM
Originally posted by tildearrow View Post

May be an incredibly powerful processor, but too bad it's only on the least Linux-friendly machines ever.

That doesn't seem fair. It doesn't have a locked bootloader, uses a standard instruction set (AARCH64), has a linux port, and doesn't require any hacking/exploit to install your own OS on it. Sure Apple doesn't support Linux, but they aren't hindering it either. Marcan does have a Linux port (asahi Linux) that generally works on the Mini M1 (booting, network, SMP support, etc). It's "usable", but the big missing piece it the GPU acceleration, which is being worked on. But sure if you want to help join in or pitch a few $ a month on patreon to help pay for the work, I am.
Leave a comment:
Developer12 replied

15 March 2022, 11:20 AM
Originally posted by piotrj3 View Post

Thing is if you have Vulkan, you can leverage Zink. dxvk, Angle and pretty much use all other APIs with pretty good performance.

You don't need zink if you have native openGL. That's enough for most GPU accelerated tasks (eg DEs).
Leave a comment:
coder replied

13 March 2022, 10:52 PM
Originally posted by Ladis View Post

Also how much is Vulkan cross platform? It's not on Mac, iOS, Windows UWP (Microsoft Store), Window on ARM, Xbox, PS. On Switch it's only a second class API added later.

Counting only native implementations is somewhat self-serving.
; )

Originally posted by Ladis View Post

On Android only newer devices have it, 5 years ago 93% devices didn't have it.

Android API changes are probably enough to block most new games from such old devices, not to mention their wildly out-dated performance characteristics and feature levels. Probably most of those devices, at the time, didn't even support OpenGL ES 3.1.
Likes 1
Leave a comment:
coder replied

13 March 2022, 10:46 PM
Originally posted by Ladis View Post

Second fun fact, even DX12 was available before Vulkan, so Microsoft is in the same position as Apple - why to switch to Vulkan now?

I heard an informed opinion relevant to this, directly from an industry insider I sometimes talk with. According to him, the reason Microsoft jumped into bed with Nvidia, on DXR (DirectX Raytracing), is because they were start to see signs the industry was shifting to Vulkan, even on Windows.

That suggests that, even though DX12 might've retained dominance, Microsoft's position is tenuous. Perhaps if Sony would embrace Vulkan, it could be enough to meaningfully shift the balance towards Vulkan. Of the gaming platforms out there, I think Google is so far really the main Vulkan backer.

I think it'll be interesting to see what Chinese companies do, in coming years. As they stand up alternatives to Windows, I'd expect they'll be pushing mainly on the Vulkan front, at least in the short/medium-term.
Likes 1
Leave a comment:
Ladis replied

13 March 2022, 10:40 PM
Originally posted by coder View Post

As mdedetrich said, you're confusing "de facto" standard with various degrees of formal standards.

What makes Khronos an industry standards body is that it exists purely as a voluntary collaboration of industry players. This is in contrast to government-based standards bodies, like ANSI, ISO, etc.

Anyway, something can be a de facto standard, even without any formal specification released by anyone. Or, you can have an industry standard or an ISO standard that's simply irrelevant. The matter of a standard's utilization is largely independent from whether and how it was formalized. The only exception to this is that you can't really have a de facto standard that's not used by anyone, because a de facto owes its status simply to the fact that people are using it.

So many words, but do you know any program using Vulkan? I just know about few games.

Originally posted by coder View Post

Not the same, at all. Apple won't allow AMD to add OpenGL or Vulkan to their Mac OS drivers.

Microsoft doesn't try to prevent 3rd party hardware from offering whatever APIs it wants. It remains neutral on the issue of OpenGL and Vulkan, even as they stand in competition to its own DirectX API.

Learn the history, man. Vista didn't allowed real OpenGL in the composited mode, until gamers, companies and GPU makers protested (fun fact: the same problem was in the first composite desktop's implementation in Linux):

Call to Action: Ensure that OpenGL remains a first class API under Windows Vista

https://community.khronos.org/t/call-to-action-ensure-that-opengl-remains-a-first-class-api-under-windows-vista/53355

Microsoft’s current plan for OpenGL on Windows Vista is to layer OpenGL over Direct3D in order to use OpenGL with a composited desktop to obtain the Aeroglass experience. If an OpenGL ICD is run - the desktop compositor will switch off - significantly degrading the user experience. In practice this means for OpenGL under Aeroglass: [ul][li]OpenGL performance will be significantly reduced - perhaps as much as 50% [/li] [li]OpenGL on Windows will be fixed at a vanilla version of OpenGL 1.4 [/li]...
Leave a comment:
coder replied

13 March 2022, 10:38 PM
Originally posted by mdedetrich View Post

And if coder is correct about Apple "supporting" MoltenVK than I wouldn't be surprised if Apple is actually working on it behind the scenes.

I didn't go that far. MoltenVK is a Khronos project, meaning all industry contributors to it should be Khronos members and likely also contributors to the Vulkan spec. In the Vulkan 1.3 link I posed above, I see Apple not created even once, in relation to any version of the standard.

This isn't proof, but it's pretty good evidence that MoltenVK has Apple's tacit approval, at best. Probably, they like just being able to point at it, whenever anyone whines about their lack of native Vulkan support.

Since so many Vulkan features are optional, I think it'd be interesting to see a feature comparison of MoltenVK and native implementations, on other major platforms. There mere fact that MoltenVK exists doesn't mean that it will be an adequate solution for all Vulkan apps, particularly if it's missing some significant features.
Leave a comment:
coder replied

13 March 2022, 10:22 PM
Originally posted by Ladis View Post

For something to become an industry standard, it has to exist for a long time and be used by many companies

As mdedetrich said, you're confusing "de facto" standard with various degrees of formal standards.

What makes Khronos an industry standards body is that it exists purely as a voluntary collaboration of industry players. This is in contrast to government-based standards bodies, like ANSI, ISO, etc.

Anyway, something can be a de facto standard, even without any formal specification released by anyone. Or, you can have an industry standard or an ISO standard that's simply irrelevant. The matter of a standard's utilization is largely independent from whether and how it was formalized. The only exception to this is that you can't really have a de facto standard that's not used by anyone, because a de facto owes its status simply to the fact that people are using it.

Originally posted by Ladis View Post

Microsoft Windows is the same like macOS about 3rd party APIs. They also support libraries converting other APIs, e.g. OpenGL, to their API, e.g. Direct3D. Microsoft is also not responsible for OpenGL and Vulkan on Windows, that's on GPU makers.

Not the same, at all. Apple won't allow AMD to add OpenGL or Vulkan to their Mac OS drivers.

Microsoft doesn't try to prevent 3rd party hardware from offering whatever APIs it wants. It remains neutral on the issue of OpenGL and Vulkan, even as they stand in competition to its own DirectX API.

Last edited by coder; 13 March 2022, 10:30 PM.
Likes 1
Leave a comment:

Announcement

Apple M1 Ultra With 20 CPU Cores, 64 Core GPU, 32 Core Neural Engine, Up To 128GB Memory

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: