Announcement

Collapse
No announcement yet.

Apple M1 ARM Performance With A 2020 Mac Mini

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Closed-Systems-Rock
    replied
    Most tests are Rosetta2 translations so where are the M1 native benchmarks in this piece? Surely the ‘power of open source’ would allow most tests to be recompiled (if not fully optimised, especially the ray/path tracing) almost immediately.

    Leave a comment:


  • dave_boo
    replied
    Originally posted by Jumbotron View Post
    Not only that but once Windows for ARM was running on the ARM M1 Mac he was able to then run x86 Windows programs as well. And he says the performance is pretty zippy. Here's a snippet from the article....


    Developer Alexander Graf, however, took to Twitter today to share his achievement: successfully being able to virtualize ARM Windows on Apple Silicon.
    Who said Windows wouldn't run well on #AppleSilicon? It's pretty snappy here 😁. #QEMU patches for reference: https://t.co/qLQpZgBIqI pic.twitter.com/G1Usx4TcvL

    — Alexander Graf (@_AlexGraf) November 26, 2020

    Note that he was able to virtualize the ARM version of Windows and not the x86 version. Virtualizing an x86 version of Windows might have been much difficult as compared to the ARM version as Apple’s M1 chip has a 64-bit ARM architecture.

    Although, Graf also mentions in one of his tweets that “Windows ARM64 can run x86 applications really well. It’s not as fast as Rosetta 2, but close.”

    He was able to achieve this by running the Windows ARM64 Insider Preview by virtualizing it through the Hypervisor.framework. This framework allows users to interact with virtualization technologies in user space without having to write kernel extensions (KEXTs), according to Apple.

    Moreover, this wouldn’t have been possible without applying a custom patch to the QEMU virtualizer. QEMU is an open-source machine emulator and virtualizer. It’s known for “achieving near-native performance” by executing the guest code directly on the host CPU. So it goes without saying that only ARM guests can be perfectly virtualized on an ARM machine like the M1-supported Macs.


    Below you will find links to the entire article on The 8-Bit along with the link to his patches to QEMU and his Twitter feed detailing further things about his accomplishment.


    https://the8-bit.com/developer-succe...on-m1-macbook/

    https://lists.gnu.org/archive/html/q.../msg06499.html

    https://twitter.com/_AlexGraf/status...81983879569415
    Why is it not surprising (although worthy of a chuckle) that Apple can emulate Windows faster than Microsoft can emulate Windows?

    Leave a comment:


  • piotrj3
    replied
    Originally posted by rabcor View Post

    Even if this is the case, the fact of the matter is that they are getting this level of performance with a 15w processor, it is so efficient that laptops using it can potentially have a similar battery life to phones if they use it, and a tdp so low it can be passively cooled in a laptop, and it is sometimes outperforming native x86 processors on emulation software, the performance per watt is something we've never seen before, and just this alone means that all signs indicate that processor technology is about to undergo a revolution, most likely with ARM replacing x86. The pros of moving to ARM clearly outweigh the cons, and no matter how they achieved it, Apple just proved this.

    It just kinda sucks that it was Apple that did it, I cannot think of a worse company for this to come from...
    It is not "15W" more like 20-24W range. What is more it does that power with just 4 performance cores that cannot even do HT/SMT. The thing is if you compare it to 4500U/4700U those CPUs also look great, and those are rather "outdated" at this point. Only thing revolutionary in Apple's case is how they optimized memory layout and how effective memory controller is - that is a place where Apple did something AMD/Intel can't, but just cores aren't that great although they are good.

    Leave a comment:


  • Weasel
    replied
    Originally posted by rabcor View Post
    and it is sometimes outperforming native x86 processors on emulation software
    Oh look another moron. Why don't you compare it with processors from 2004 too, it's gonna prove your point even better.

    Imagine comparing a 5nm CPU with a 10nm CPU and still being slower. That's two fab generations ahead. 4 times the effective transistor budget.

    Leave a comment:


  • rabcor
    replied
    Originally posted by piotrj3 View Post
    A lot of people are surerly suprised about results but one thing people are not aware about it is....
    128-bit memory bus.

    Literally that M1 works in 8x 16 bit channels of memory with LPDDR4X-4266-class. A lot of results you see that are impressive do not come from superiority of ARM silicon, but more from that it supports more memory channels then normal PCs (outside of HEDT) with very fast ddr4 ram by default that is localized closely to chip itself. Also there a single core can utilize whole memory bus not just some group of cores. Speed of this chip comes a lot from that 1 core of it can access as much data as Threadripper 1950X which is 16 core-32 thread CPU (at level of ~~60GB/s)

    This results with impressive results in some benchmarks like ZSTD compression or SQL stuff, but it doesn't show superiority of ARM architecture at all, but more shows it, that normal desktop CPUs should start moving to quad channels instead of dual.
    Even if this is the case, the fact of the matter is that they are getting this level of performance with a 15w processor, it is so efficient that laptops using it can potentially have a similar battery life to phones if they use it, and a tdp so low it can be passively cooled in a laptop, and it is sometimes outperforming native x86 processors on emulation software, the performance per watt is something we've never seen before, and just this alone means that all signs indicate that processor technology is about to undergo a revolution, most likely with ARM replacing x86. The pros of moving to ARM clearly outweigh the cons, and no matter how they achieved it, Apple just proved this.

    It just kinda sucks that it was Apple that did it, I cannot think of a worse company for this to come from...

    Leave a comment:


  • Jumbotron
    replied
    Not only that but once Windows for ARM was running on the ARM M1 Mac he was able to then run x86 Windows programs as well. And he says the performance is pretty zippy. Here's a snippet from the article....


    Developer Alexander Graf, however, took to Twitter today to share his achievement: successfully being able to virtualize ARM Windows on Apple Silicon.
    Who said Windows wouldn't run well on #AppleSilicon? It's pretty snappy here 😁. #QEMU patches for reference: https://t.co/qLQpZgBIqI pic.twitter.com/G1Usx4TcvL

    — Alexander Graf (@_AlexGraf) November 26, 2020

    Note that he was able to virtualize the ARM version of Windows and not the x86 version. Virtualizing an x86 version of Windows might have been much difficult as compared to the ARM version as Apple’s M1 chip has a 64-bit ARM architecture.

    Although, Graf also mentions in one of his tweets that “Windows ARM64 can run x86 applications really well. It’s not as fast as Rosetta 2, but close.”

    He was able to achieve this by running the Windows ARM64 Insider Preview by virtualizing it through the Hypervisor.framework. This framework allows users to interact with virtualization technologies in user space without having to write kernel extensions (KEXTs), according to Apple.

    Moreover, this wouldn’t have been possible without applying a custom patch to the QEMU virtualizer. QEMU is an open-source machine emulator and virtualizer. It’s known for “achieving near-native performance” by executing the guest code directly on the host CPU. So it goes without saying that only ARM guests can be perfectly virtualized on an ARM machine like the M1-supported Macs.


    Below you will find links to the entire article on The 8-Bit along with the link to his patches to QEMU and his Twitter feed detailing further things about his accomplishment.


    https://the8-bit.com/developer-succe...on-m1-macbook/

    https://lists.gnu.org/archive/html/q.../msg06499.html

    https://twitter.com/_AlexGraf/status...81983879569415

    Leave a comment:


  • piotrj3
    replied
    Originally posted by bridgman View Post

    At the risk of asking a dumb question, doesn't a typical PC CPU also have 128-bit memory (2 channels x 64-bits/channel) ?

    Agree that having more / smaller channels opens the door for more efficient memory usage (we do the same with GPUs) but I think they are both 128-bit, at least if you ignore OEMs who configure a dual-channel CPU with single-channel RAM and no expansion capability.
    It is good question. Yes it is true that normal CPUs theoreticly do have 2x64bits, but with many channels performance is actually clearly always worse comparing to small channels, also it means everytime you read a value smaller then 64 bits, you theoreticly waste memory bus (which considering most typical type of data is integer.... is quite wasteful).

    I mean theoretical max performance of memory in 128 bit memory bus with 4266MHz memory would be around ~~68GB/s. Practicly in benchmarks not even Ryzen 5950X achieves beyond ~~36GB/s (Intel 10900k is even worse here). Meanwhile on Anandtech review of M1 there was such quote

    One aspect we’ve never really had the opportunity to test is exactly how good Apple’s cores are in terms of memory bandwidth. Inside of the M1, the results are ground-breaking: A single Firestorm achieves memory reads up to around 58GB/s, with memory writes coming in at 33-36GB/s. Most importantly, memory copies land in at 60 to 62GB/s depending if you’re using scalar or vector instructions. The fact that a single Firestorm core can almost saturate the memory controllers is astounding and something we’ve never seen in a design before.
    Last edited by piotrj3; 27 November 2020, 07:33 AM.

    Leave a comment:


  • bridgman
    replied
    Originally posted by piotrj3 View Post
    A lot of people are surerly suprised about results but one thing people are not aware about it is....
    128-bit memory bus.

    Literally that M1 works in 8x 16 bit channels of memory with LPDDR4X-4266-class. A lot of results you see that are impressive do not come from superiority of ARM silicon, but more from that it supports more memory channels then normal PCs (outside of HEDT) with very fast ddr4 ram by default that is localized closely to chip itself.
    At the risk of asking a dumb question, doesn't a typical PC CPU also have 128-bit memory (2 channels x 64-bits/channel) ?

    Agree that having more / smaller channels opens the door for more efficient memory usage (we do the same with GPUs) but I think they are both 128-bit, at least if you ignore OEMs who configure a dual-channel CPU with single-channel RAM and no expansion capability.

    Leave a comment:


  • piotrj3
    replied
    A lot of people are surerly suprised about results but one thing people are not aware about it is....
    128-bit memory bus.

    Literally that M1 works in 8x 16 bit channels of memory with LPDDR4X-4266-class. A lot of results you see that are impressive do not come from superiority of ARM silicon, but more from that it supports more memory channels then normal PCs (outside of HEDT) with very fast ddr4 ram by default that is localized closely to chip itself. Also there a single core can utilize whole memory bus not just some group of cores. Speed of this chip comes a lot from that 1 core of it can access as much data as Threadripper 1950X which is 16 core-32 thread CPU (at level of ~~60GB/s)

    This results with impressive results in some benchmarks like ZSTD compression or SQL stuff, but it doesn't show superiority of ARM architecture at all, but more shows it, that normal desktop CPUs should start moving to quad channels instead of dual.

    Leave a comment:


  • WorBlux
    replied
    Originally posted by ldesnogu View Post
    I was answering to the claim you can still run DOS programs on a Windows x86 machine. You're facing the same issue as on an M1 machine: you have to run DOSBOX on Win 10 64-bit.


    Why would a user program need accurate hardware simulation? I mean beyond old programs, no one directly accesses HW anymore I hope. And for cross graphics API translation I already posted a link which shows Wine working on an M1.

    Anyway for Apple, emulation is just a stop gap until most applications are ported. And it's doing a very good job at that it seems.
    Win7_32 will run on modern hardware, and supports DOS and a huge swath of the x86 windows back catalogue. Yes DOS is largely a solved problem, but there is a large catalogue of application between the DOS era and the modern app store.

    And porting only helps if you or someone that still cares about the application has the source code. There are still functional programs out there that have unique feature, but which for the source is lost to time. And yes some of these touch hardware more directly for whatever reason. Maybe the control an external device or are quite sensitive to timing, or are self-modifying in some way. Experience has shown there are always corner cases, and the more corner it is the harder it is to fix.

    And I did look more into the crossover on rosseta claim. Yes some applications work well, but support is spotty at best. You are always going to be translating through two layers of API. Direct X or GL -> vulkan -> metal. and nobody in the FOSS world is really that interested in doing DX->metal. Parallels looks like it has to basic support for a paravirtualized dx11 driver above metal, but there are still reports of compatibility problems there, and how you'd best leverage rosetta in that situation is a question unanswered.

    Leave a comment:

Working...
X