Announcement

Collapse
No announcement yet.

Apple M1 ARM Performance With A 2020 Mac Mini

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by PerformanceExpert View Post

    On modern CPUs the vast majority of instructions are a single micro-op. Only complex instructions are split into multiple micro-ops. It varies but eg. for Cortex-A72 "On average, Filippo said, each ARMv8 instruction translates into 1.08 micro-ops.".

    So micro-ops are just a different encoding of the original ISA.
    ARM is considered RISC so it shows that even traditional RICS went for hybrid aproach.
    I would be interested in the ration for x86 which was traditionally CISC. On the Zen3 arch debate between Ian Cutress from Anandtech and Wendell from L1techs mentioned how some x87 instructions were speeded by a lot. So there is still a multimicro-ops staf even for something that old.

    Originally posted by PerformanceExpert View Post
    You have RISC and CISC swapped here. Initial RISCs didn't have any complex instructions, and every instruction was directly executed in a single cycle. CISCs used a micro-code engine to execute every instruction, which took many cycles and was extremely slow. Those days are gone now. RISC ISAs became more complex, while CISCs stopped using the most complex instructions and sped up the commonly used operations by using more transistors.
    You are right. Gonna edit it so it is not confusing people. Thanks.

    Comment


    • Originally posted by ldesnogu View Post
      You can't run DOS programs on Win 10 64-bit, you have to rely on an emulator (which could be run on an ARM machine).

      And as far as Wine is concerned: https://www.macrumors.com/2020/11/18...oftware-on-m1/

      Anyway as far as long term HW supoprt goes, nothing beats a self assembled machine with carefully chosen components

      But that's an OS issue, not a hardware one. The hardware still supports 16-bit execution just fine. Unlike ARM which can't even run the last major revision of the ISA.

      Hence why apple has been asking for the llvm IR for quite some time, and developers would be smart to stash a copy of the IR somewhere if they are distributing outside the apple app store. However that doesn't help existing legacy applications. Rosetta 2 looks good, but because it's not a full hardware emulation you're going to be finding corner cases for years to come, and that everything GPU has to be translated to Metal, a bespoke costum API doesn't help matters any. Comparing to DOSBOX which does cycle accurate emulation of a whole system is certainly on the hopeful side.

      Comment


      • Originally posted by pixo View Post

        ARM is considered RISC so it shows that even traditional RICS went for hybrid aproach.
        I would be interested in the ration for x86 which was traditionally CISC. On the Zen3 arch debate between Ian Cutress from Anandtech and Wendell from L1techs mentioned how some x87 instructions were speeded by a lot. So there is still a multimicro-ops staf even for something that old.


        You are right. Gonna edit it so it is not confusing people. Thanks.
        ARM also added an instruction just to accelerate java-script. Maybe not so RISC anymore. The reason you do micro-ops is to reduce the complexity of the execution side in favor of some complexity at the front end. What exactly is cut and what isn't is highly design specific. And then there's also instruction fusion. You take common idioms in instruction streams and mash them together to one fused-op. (Say add 1 and compare.) The idea being not only do you save a cycle, you avoid the power overhead of moving the data around so much.

        There is indeed a lot going on in the micro-arch that can add to the strengths or mitigate the weaknesses of the top-level ISA.

        Comment


        • Originally posted by WorBlux View Post

          ARM also added an instruction just to accelerate java-script. Maybe not so RISC anymore. The reason you do micro-ops is to reduce the complexity of the execution side in favor of some complexity at the front end. What exactly is cut and what isn't is highly design specific. And then there's also instruction fusion. You take common idioms in instruction streams and mash them together to one fused-op. (Say add 1 and compare.) The idea being not only do you save a cycle, you avoid the power overhead of moving the data around so much.

          There is indeed a lot going on in the micro-arch that can add to the strengths or mitigate the weaknesses of the top-level ISA.
          That is true but for that you dont need micro-ops. Those are needed to split your instruction to sequence of micro-ops and execute them in multiple cycles. These sequences are defined in microcode.
          Pure RISC CPUs did not have micro-ops because all instructions were implemented in HW (hardwired) and there was no need to split them to sequence of micro-ops. Thats why each instruction took one cycle to execute. Fusing instruction can still be done but it comes to terminology if you call them fused-ops or fused instructions. There is still no need to split the instruction to sequence of micro-ops. And in OoO execution you reorganize and execute several instruction, if possible, instead of micro-ops like in CISC.

          In the end both ARM and x86 are hybrids. Some ARM instructions are done via micro-ops to save space and some x86 instructions are hardwired for speed.
          How much is done via micro-ops, and by how many of them, and how much is hardwired is one aspect of micro-arch.

          Comment


          • Originally posted by pixo View Post

            That is true but for that you dont need micro-ops. Those are needed to split your instruction to sequence of micro-ops and execute them in multiple cycles. These sequences are defined in microcode.
            Pure RISC CPUs did not have micro-ops because all instructions were implemented in HW (hardwired) and there was no need to split them to sequence of micro-ops. Thats why each instruction took one cycle to execute. Fusing instruction can still be done but it comes to terminology if you call them fused-ops or fused instructions. There is still no need to split the instruction to sequence of micro-ops. And in OoO execution you reorganize and execute several instruction, if possible, instead of micro-ops like in CISC.

            In the end both ARM and x86 are hybrids. Some ARM instructions are done via micro-ops to save space and some x86 instructions are hardwired for speed.
            How much is done via micro-ops, and by how many of them, and how much is hardwired is one aspect of micro-arch.
            Yes fused-ops is probably the better term.

            Yes the first RISC's were doing it that way to simplify pipelining. Most instruction were one cycle, but not all, multiply and load come to mind but those are conceptually simple tasks and you'd raise a stall until it was safe to proceed with issue again.

            Yet OoO and super-scaler have changed the constraints and bottlenecks since then, and the u-arch has somewhat converged. For kicks and giggles compare the A78 to skylake, and the A76 to Tremont.

            Comment


            • Originally posted by WorBlux View Post
              But that's an OS issue, not a hardware one. The hardware still supports 16-bit execution just fine. Unlike ARM which can't even run the last major revision of the ISA.
              I was answering to the claim you can still run DOS programs on a Windows x86 machine. You're facing the same issue as on an M1 machine: you have to run DOSBOX on Win 10 64-bit.

              Hence why apple has been asking for the llvm IR for quite some time, and developers would be smart to stash a copy of the IR somewhere if they are distributing outside the apple app store. However that doesn't help existing legacy applications. Rosetta 2 looks good, but because it's not a full hardware emulation you're going to be finding corner cases for years to come, and that everything GPU has to be translated to Metal, a bespoke costum API doesn't help matters any. Comparing to DOSBOX which does cycle accurate emulation of a whole system is certainly on the hopeful side.
              Why would a user program need accurate hardware simulation? I mean beyond old programs, no one directly accesses HW anymore I hope. And for cross graphics API translation I already posted a link which shows Wine working on an M1.

              Anyway for Apple, emulation is just a stop gap until most applications are ported. And it's doing a very good job at that it seems.

              Comment


              • Originally posted by ldesnogu View Post
                I was answering to the claim you can still run DOS programs on a Windows x86 machine. You're facing the same issue as on an M1 machine: you have to run DOSBOX on Win 10 64-bit.


                Why would a user program need accurate hardware simulation? I mean beyond old programs, no one directly accesses HW anymore I hope. And for cross graphics API translation I already posted a link which shows Wine working on an M1.

                Anyway for Apple, emulation is just a stop gap until most applications are ported. And it's doing a very good job at that it seems.
                Win7_32 will run on modern hardware, and supports DOS and a huge swath of the x86 windows back catalogue. Yes DOS is largely a solved problem, but there is a large catalogue of application between the DOS era and the modern app store.

                And porting only helps if you or someone that still cares about the application has the source code. There are still functional programs out there that have unique feature, but which for the source is lost to time. And yes some of these touch hardware more directly for whatever reason. Maybe the control an external device or are quite sensitive to timing, or are self-modifying in some way. Experience has shown there are always corner cases, and the more corner it is the harder it is to fix.

                And I did look more into the crossover on rosseta claim. Yes some applications work well, but support is spotty at best. You are always going to be translating through two layers of API. Direct X or GL -> vulkan -> metal. and nobody in the FOSS world is really that interested in doing DX->metal. Parallels looks like it has to basic support for a paravirtualized dx11 driver above metal, but there are still reports of compatibility problems there, and how you'd best leverage rosetta in that situation is a question unanswered.

                Comment

                Working...
                X