Apple M4 Mac Mini With macOS vs. Intel / AMD With Ubuntu Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Dukenukemx
    Senior Member
    • Nov 2010
    • 1396

    #51
    Originally posted by gnattu View Post

    Because it is not losing to everything and no, HX370 does not do equally as well against M4. The benchmarks that the HX370 has close efficiency or higher efficiency, the M4 is running with Rosetta x86 emulation layer. Runs on par with latest gen competing processor in the same class with an emulation layer is a really, really good result. You can hate Apple, but you cannot deny their CPU is good.
    Which tests are running on Rosetta because aren't these tests mostly on open source software that can be natively compiled on the M4? With some exceptions like indigobench which claims to have a native port to MacOS. With the exception of FLAC, the M4 tested dead last or damn near it with every test. While the M4 did win often in power efficiency, but this is against desktop class CPU's. The exception was the Ryzen AI 9 HX 370 which was often nearly as efficient and in some rare cases outperforming it per watt. kvazaar ​is open source and I would assume this was running native on MacOS?

    Originally posted by mos87 View Post
    Pretty pointless comparisons outside of theoretical interest IMO. From the consumer POV especially. When you need a workstation you're looking to do a particular job on it and couldn't care less about the other stuff. OTOH if one's in the market for a general purpose desktop, there are other things that come into play - mostly preference (mac people), price, availability, software support etc.
    And hardly anybody is interested in power efficiency of a desktop. Which is anyway negligible (even if the raw % looks sumptuous), not like Apple doesn't draw power at all.
    It's a lot better than the countless amount of Geekbench and Cinebench scores floating around the internet. Like anybody cares what the scores from those benchmarks even mean.
    Heck, you can't even compare the prices bc it's a CPU (x86) vs the whole system (Apple). Also extensibility is different, but I already mentioned that.
    Some other measurement technique has to be devised to make such comparisons really interesting.
    Unless you wanna add more ram and storage then be ready to pay for twice as much as even a single Mac Mini M4, and even then with only 32GB of ram and 512GB of storage. Who even has 256GB of storage in a PC in 2024?

    There's an old adage that ARM are power efficient when you don't ask it to do anything.
    Also there's an arcticle floating about that claims that architectural differences between platforms are irrelevant today, instead it's what it optimized to do. Software has to play major role too here I guess.
    ​Good point. Can we get tests done with CachyOS instead of Ubuntu?

    Comment

    • neured
      Junior Member
      • Apr 2023
      • 1

      #52
      Originally posted by Dukenukemx View Post
      With the exception of FLAC, the M4 tested dead last or damn near it with every test.
      That's not quite what the results show.

      FFmpeg: Middle
      LLVM: Middle
      7-Zip x2: Last
      Zstd x2: 2nd to last
      C-Ray x2: Middle
      Appleseed x3: Last (this is an x86_64 Rosetta emulation)
      Chaos: Last (this is an x86_64 Rosetta emulation)
      IndigoBench x2: Last
      QuantLib: Last
      Apache HTTP: Last
      DuckDB: 2nd to last
      PyBench: 2nd to last
      x265 x2: Last
      Kvazaar x4: Last
      FLAC: First
      libavif x2: Last & Middle
      JPEG-XL x3: Last (Tied in two)

      A total of 30 tests: Last in 18 of them (but 4 of those were x86_64 emulation). But also almost always first in performance per watt.

      Overall, it was quite similar to the 9800X3D, which has 8 "performance" cores versus the M4 with 4 performance cores (yes, 10 cores overall, but 6 are efficiency). So similar performance from an entire computer that costs $600 (or $500 with education discount) relative to a CPU that costs about $480 by itself. And while using a lot less electricity. It almost always was top in performance per watt (AI 9 HX 370 was the only CPU close). Comparisons are also not quite fair because of differences in CPU core counts (including performance/efficiency ones). These also are not entirely fair comparisons (Linux vs macOS) in either direction (meaning when the M4 'wins' or when it 'loses'). It's also not quite fair because in real world situations someone might use software that's not necessarily cross-platform but will instead use something optimized for the OS. How many of these tools have the same or more optimization for macOS as Linux?

      Comment

      • gnattu
        Phoronix Member
        • Jul 2023
        • 107

        #53
        Originally posted by Dukenukemx View Post
        Which tests are running on Rosetta because aren't these tests mostly on open source software that can be natively compiled on the M4? With some exceptions like indigobench which claims to have a native port to MacOS. With the exception of FLAC, the M4 tested dead last or damn near it with every test. While the M4 did win often in power efficiency, but this is against desktop class CPU's. The exception was the Ryzen AI 9 HX 370 which was often nearly as efficient and in some rare cases outperforming it per watt. kvazaar ​is open source and I would assume this was running native on MacOS?
        Let's go over those one by one. "With the exception of FLAC" is really not the case. kvazaar does not have arm neon code path at all but it does have x86 avx code paths, so you are comparing simd optimized code vs compiler generated c code here.

        For you claim "Ryzen AI 9 HX 370 which was often nearly as efficient and in some rare cases outperforming it per watt", let's look at the energy used per run:

        - Timed ffmpeg compliation: M4 used 473 Joules and HX 370 used 1657 Joules, the HX 370 used 3.5x as much energy being 40% slower
        - Timed llvm complication: M4 used 5707 Joules and HX 370 used 16043 Joules, the HX 370 used 2.8x as much energy being 38% slower
        - 7zip compression: M4 used 208 Joules and HX 370 used 1053 Joules, the HX 370 used 5x as much energy being 12% faster
        - zstd compression: M4 used 453 Joules and HX 370 used 1053 Joules, the HX 370 used 2.3x as much energy being 10% slower
        - C-Ray: M4 used 2513 Joules and HX 370 used 7880 Joules, the HX 370 used 3.13x as much energy being 48% slower
        - Appleseed: Micheal did not post total energy used for this one. But M4 is running on x86 emulation and "only" being 26% slower while using 80% avg power in watt
        - Chaos group V-Ray: M4 used 808 Joules and HX 370 used 1739 Joules, the HX 370 used 2.1x as much energy and being 22% faster
        - IndigoBench: M4 used 797 Joules and HX 370 used 1405 Joules, the HX 370 used 1.7x as much energy and being 30% faster
        - QuantLib: not ran on HX 370, no comparison here
        - Apache: M4 used 573 Joules and HX 370 used 1687 Joules, the HX 370 used 2.9x as much energy and being 56% faster
        - DuckDB: M4 used 1359 Joules and HX 370 used 2821 Joules, the HX 370 used 2x as much energy and being 22% slower
        - PyBench: M4 used 68 Joules and HX 370 used 152 Joules, the HX 370 used 2.2x as much energy and being 30% slower
        - x265: M4 used 148 Joules and HX 370 used 269 Joules, the HX 370 used 1.8x as much energy and being 15% faster
        - kvazaar: M4 used 479 Joules and HX 370 used 443 Joules, the HX 370 used 0.9x as much energy and being 2.2x faster
        - flac: M4 used 64 Joules and HX 370 used 137 Joules, the HX 370 used 2.1x as much energy and being 48% slower
        - libavif: M4 used 2086 Joules and HX 370 used 3063 Joules, the HX 370 used 1.4x as much energy and being 31% faster
        - libjxl: M4 used 590 Joules and HX 370 used 1082 Joules, the HX 370 used 1.8x as much energy and being 20% faster

        Just by looking at those energy numbers, can you say "FLAC is the exception"? More likely the non-arm native or no arm simd benchmarks are the exceptions here. In almost every benchmark the total energy used by M4 is lower and the HX 370 is not always faster, and even for the faster cases, it is not fast to the scale with total energy used.

        The HX 370 is more close when the task is SIMD heavy so that it can take advantage of of more mature x86 SIMD code paths, but for heavy scalar code like the the compilation it lost hard in both speed and efficiency.

        It's a lot better than the countless amount of Geekbench and Cinebench scores floating around the internet. Like anybody cares what the scores from those benchmarks even mean.
        It outperformed HX 370 even in specint17 and you need at least 5.7GHz Zen5 to be on par with it, and the Zen5+Zen5c in HX 370 has no chance again it in single core specint17 either because it cannot be clocked that high.
        Last edited by gnattu; 14 November 2024, 10:50 AM.

        Comment

        • ehansin
          Senior Member
          • Oct 2016
          • 699

          #54
          Ignoring the processor discussion here (which I have learned a lot reading the comments), I like the form factor. I am using a 2018 Mac Mini (Intel iCore 7 + 64GB + 2TB) that I have access to that I set up to dual-boot to Arch (which is pretty much what I mostly boot into.) For average consumers that just need a computer to do average consumer computing stuff, this thing is really cool.

          These days most people don't need a CD/DVD drive anymore, etc. Having a smaller box that has a good enough selection of ports and well designed for thermals, quietness, etc. is great. Using this thing got me thinking and looking forward to more of this, especially one that allows greater access to internals (thinking of the recent RISC-V Framework laptop article). One may not like Apple for good reason, but this is a nice form factor for a lot of people.

          Comment

          • coder
            Senior Member
            • Nov 2014
            • 8959

            #55
            Originally posted by gnattu View Post
            There is so silver bullet though, this shared library approach also makes it hard to deploy software with complex dependencies because each and every distro and different versions of those distros disagrees on everything and it is almost impossible for software distributors to compile that much version,
            So, if you're talking about proprietary software, I've seen examples where they just support two or three of the main enterprise distros: RedHat, Ubuntu, and SuSE. I think they figure most business users are deploying one of those. In my experience, stuff made for SuSE Enterprise works fine on Leap.

            Comment

            • coder
              Senior Member
              • Nov 2014
              • 8959

              #56
              Originally posted by mos87 View Post
              hardly anybody is interested in power efficiency of a desktop.
              A lot of people are, myself included.

              Originally posted by mos87 View Post
              Heck, you can't even compare the prices bc it's a CPU (x86) vs the whole system (Apple).
              Of course you can, but you just need to spec out a barebones PC around the CPUs that are being compared against the Mac Mini.

              Originally posted by mos87 View Post
              There's an old adage that ARM are power efficient when you don't ask it to do anything.
              That's not supported by a lot of prior data I've seen. I think we don't have good explanations of the performance gap seen here. In some of these cases, SIMD optimizations for x86 will be tipping the scales in that direction.

              Originally posted by mos87 View Post
              Also there's an arcticle floating about that claims that architectural differences between platforms are irrelevant today,
              If you mean the Chips & Cheese article, that was poorly researched. The author wasn't even aware of APX.

              Comment

              • anarki2
                Senior Member
                • Mar 2010
                • 860

                #57
                Now all we need is a RISC-V Apple chip and our lives are complete!

                Comment

                • coder
                  Senior Member
                  • Nov 2014
                  • 8959

                  #58
                  Originally posted by mdedetrich View Post
                  This is half true, while you are correct that most Windows applications bundle all of there dependencies, Windows library loading mechanism for .dll files means you can override a dependency for a specific application just by placing the newer dll in the same folder as the executable and by default the c shared library loader will load that dll instead.
                  Do you realize what a shit solution that is? First, it requires the user to know which packages use a vulnerable library. Then, it requires them to know the compatibility matrix. Finally, it requires them to specifically "upgrade" each package that uses each vulnerable library.

                  It's a hack, at best. Not a real solution, at all.

                  Comment

                  • coder
                    Senior Member
                    • Nov 2014
                    • 8959

                    #59
                    Originally posted by spykes View Post
                    The better efficiency is literally what they get from having access to the best TSMC's fab node.
                    As I said before, compare Lunar Lake vs. Apple M3. These are on the same exact fab node, and Intel eats shit. If you compare the die area of their CPU cores, Intel's are bigger. If you compare the total amount of cache, Intel is using more.

                    So, no. It's not as simple as "Apple using newer fab nodes" or "Apple using more cache".

                    Originally posted by spykes View Post
                    ​Those charts show very well that on a comparable fab node X86 is far from dead.
                    How do you think Michael decides which benchmarks to include in each article?

                    Comment

                    • spykes
                      Senior Member
                      • Dec 2008
                      • 241

                      #60
                      Originally posted by coder View Post
                      So, no. It's not as simple as "Apple using newer fab nodes" or "Apple using more cache".
                      It's definitely not as simple as "Apple is using ARM ISA" either like a lot of people want us to believe.
                      Intel still has some work to do indeed, but when I see Ryzen AI results in power efficiency they are not far being the M4 despite using a not so advanced node (and they are still beating Apple in numerous CPU bench).

                      Comment

                      Working...
                      X