Announcement

Collapse
No announcement yet.

Amazon Graviton3 vs. Intel Xeon vs. AMD EPYC Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by mdedetrich View Post
    5% is actually quite a bit. Let me put it a different way, that figure is large that the primary reason for excluding SMT on Silvermont is because that 5% die size is signficant much. See https://www.anandtech.com/show/6936/...about-mobile/3
    I don't think Silvermont decision re: SMT is applicable to this discussion - previous Atom cores were in-order with SMT but the purpose of SMT was to help deal with memory latency. Silvermont is only 2-wide, ie not enough execution resources to get a real performance gain from having a second thread making use of resources that the first thread could not keep busy, ie you really need a wide core to get benefit from SMT.

    Silvermont was about going from an in-order core to an out-of-order core and using out-of-order execution to help mask memory latency rather than SMT. Keeping SMT on a 2-wide execution back end would probably have delivered results similar to P4 (which has comparable width), where enabling HT was a hit-and-miss thing in terms of performance.

    My recollection is that a typical OOO core usually does a better job of dealing with memory latency than SMT does, but I don't think that is always the case. I don't remember any good studies of latency tolerance for SMT/in-order vs no-SMT/OOO off the top of my head but I'm sure they exist.

    coder there you go
    Last edited by bridgman; 02 June 2022, 02:59 PM.
    Test signature

    Comment


    • #62
      Originally posted by mdedetrich View Post
      I didn't say they need to, nor did I say that its always the case
      That response feels a bit insincere, given that it's been a recurring theme of your posts.

      Originally posted by mdedetrich View Post
      This is true until the Apple released the M1 cores for laptops and now their desktop's (or mini PC's). I mean the M1 pro goes up to 3.2 ghz which for a laptop is fairly on par.
      5+ years ago, 3.2 GHz peak clocks might've been "on par" for a premium laptop, but no more. And you're disregarding that Apple is clocking lower on a newer process node than either AMD or Intel is using, which makes it even more of an outlier.

      So, yes 3.2 GHz is low, and it's low for a reason. It's low because Apple can afford the extra die space on a wider core, which is (by no coincidence) the way to maximize perf/W.

      Originally posted by mdedetrich View Post
      then get into the fallacy of directly comparing clock speed between different architectures,
      It's a fallacy only if it's used as a proxy for performance. When trying to understand the design decisions made in the CPUs, it's a relevant consideration.

      Originally posted by mdedetrich View Post
      In any case people need to stop pushing the sentiment that the M1 is a "mobile phone CPU that has lower clocks" because its not.
      That's a mis-characterization of what I said. Fact: it uses the same Firestorm cores as their A14 phone SoC.

      Originally posted by mdedetrich View Post
      Its clock speed is already well past the "mobile" range.
      Clearly, you haven't been keeping up on phone SoCs. Competing chips, made on a similar process node, run at similar peak clocks.

      Vendor SoC Mfg. Process Peak Clock Speed (GHz)
      Apple A14 TSMC N5 2.998
      Samsung Exynos 2100 Samsung 5LPE 2.91
      Mediatek Dimensity 1200 TSMC N6 3.0

      Originally posted by mdedetrich View Post
      The M1 chips was deliberately designed for high power products (i.e. pro laptops and faster) that is loosely based on their A series architecture, its not like Apple just shoved a mobile SKU into their laptops.
      You're confusing the core IP with the SoC. The Firestorm cores, used in their M1 products, were taken from their A14 phone SoC.

      It's sounding like you know a lot less about the M1 than you seem to think.

      Originally posted by mdedetrich View Post
      This is the basic flaw in your argumentation, if SMT was as beneficial as you implied (relative to die tax and other factors), Apple would have done it.
      This is a basic flaw in your argumentation. SMT is a design feature that has benefits and drawbacks. Apple had certain goals for their Firestorm cores, first among which seemed to be maximizing perf/W, because its primary applications were phones, tablets, and laptops. While the M1 Ultra isn't a laptop CPU, it's also not a high-volume part compared with the rest of the lot. Furthermore, we don't know if it was a factor in the design of the Firestorm, or if Apple's decision to make the Ultra came only after they had enough experience to believe building such a product made sense.

      So, if SMT had little benefit in low core-count applications and didn't pull its weight on the perf/W front, that would be reasonable grounds for them not to use it.

      Unfortunately, we can only speculate. We don't know the real reason(s) they haven't used it. Furthermore, you can't divorce the decision from the context. And the context here is a phone-oriented core - yes, one which they had bigger ambitions for, but not at the expense of its initial/primary application.

      Originally posted by mdedetrich View Post
      I said this before and I said it again, Apple's M1 target is desktop class, not mobile.
      That flies in the face of their sales volume, which is disproportionately biased towards laptops. The Firestorm and the M1 CPUs could not afford to pursue performance at the expense of power efficiency.

      Apple surely knows that delivering a knock-out desktop computer while launching an inaugural ARM-based laptop that overheats, thermally-throttles, and chews through battery charge is entirely counterproductive. They needed to deliver a successful laptop, while continuing their success in phones. Their desktop ambitions, if anything, were a stretch goal.

      Originally posted by mdedetrich View Post
      5% is actually quite a bit.
      If the performance benefit is significantly larger than that (and you aren't prioritizing perf/W above all else), then it's still an easy decision.

      Originally posted by mdedetrich View Post
      Let me put it a different way, that figure is large that the primary reason for excluding SMT on Silvermont is because that 5% die size is signficant much. See https://www.anandtech.com/show/6936/...about-mobile/3
      Silvermont is such a simple core that the relative overhead of HT would've surely been larger. However, you're mis-quoting the article, which actually confirms what I've been saying about SMT having a net perf/W penalty. Silvermont needed to be efficient, because its applications included phones, with a few actually seeing the light of day before Intel pulled the plug on their phone ambitions. It even got into the 1st gen MS Hololens.

      What they said about area is that HT had similar footprint to that of Silvermont's reorder buffer, although they don't say how many entries it had.

      Originally posted by mdedetrich View Post
      The amount of die space taken up with SMT is proportional to how many cores you have, so more cores is more die space taken up if you implement SMT. That can add up
      Except that die area for an entire CPU or SoC is much larger than just the cores. So, while we might be talking 5% per core, the figure is likely down to 2-3% for a server SoC or much lower for mobile (where the CPU cores occupy a shrinking minority of the entire die).

      Originally posted by mdedetrich View Post
      , so as I said before its much wiser for Apple to use that die space for something else.
      If you think a couple % larger caches are going to deliver performance benefits on par with SMT, the data we've already seen would seem to contradict that.

      And, as I said before, Apple doesn't seem to be very concerned about minimizing die area. Much less than any of its competitors, for obvious reasons I shouldn't need to repeat.

      Originally posted by mdedetrich View Post
      When it comes to CPU's anything is on the table if its brings good enough performance. As said before, if SMT was as good as you are implying, Apple would have put it in.
      Not if it hurts perf/W, as mentioned in that Silvermont article.

      Originally posted by mdedetrich View Post
      When you are a CPU company, you never rule anything out unless you have a very good reason and in Apple's case with how many resources they have, if SMT was beneficial they would have done it.
      Apple is not a CPU company. They're a products company. Their CPUs are tied to their product ambitions, which currently are centered around phones and laptops.

      Originally posted by mdedetrich View Post
      I don't know what point you are making here,
      The point I was making is that maybe the reason they moved the LPDDR5 in-package is precisely because the latency of keeping it external was more than Firestorm cores could cope with. We don't know which decision came first. That's why it's not very informative and why you can't transplant their design decisions to another context.
      Last edited by coder; 02 June 2022, 11:16 PM.

      Comment


      • #63
        Originally posted by bridgman View Post
        I don't think Silvermont decision re: SMT is applicable to this discussion
        It is, to the extent that it's a mobile x86 core. The theme being that mobile cores are the ones lacking SMT.

        Moreover, when Intel reused Silvermont for the second-gen Xeon Phi (KNL), they re-added SMT-4! That move is in line with the notion that SMT pulls more weight in higher-core count applications, which isn't what Apple's cores are optimized for.

        Originally posted by bridgman View Post
        Silvermont was about going from an in-order core to an out-of-order core and using out-of-order execution to help mask memory latency rather than SMT.
        Uhh... they don't say how many entries its reorder buffer had, but I'm sure it wasn't big enough to cover more than a L1 miss. However, don't forget that modern CPUs have hardware prefetchers, which can help reduce the frequency of L2 misses. This probably factored into their decision to drop SMT.

        I think Silvermont's OoO move wasn't only about solving a single problem. It should've been doing double-duty, covering L1 misses as well as getting better utilization of what limited ALU resources it did have.

        Originally posted by bridgman View Post
        P4 (which has comparable width), where enabling HT was a hit-and-miss thing in terms of performance.
        Ah, the Pentium 4. Among its problems with HT was the lack of any mechanism to keep the threads from thrashing each others' cache. When Intel brought back HT (after leaving it out of the Pentium M and Core/Core 2 products), this is one of the aspects they addressed.

        Originally posted by bridgman View Post
        My recollection is that a typical OOO core usually does a better job of dealing with memory latency than SMT does,
        Depends on how much, right? The cool thing about SMT is that if one thread has sufficiently high locality, it can cover virtually infinite amount of latency seen by the other(s).

        Again, we need look only at GPUs to see how effective SMT can be at hiding latency. They have certainly the widest SMT implementations of any hardware today, and virtually no other reason for it than latency-hiding. Intel's Gen9 iGPUs had 7-way, GCN supported 64 wavefronts per CU, and Nvidia has supported between 32 and 64 warps per SM. IDK how many Xe supports.

        Comment


        • #64
          Originally posted by coder View Post
          This is a basic flaw in your argumentation. SMT is a design feature that has benefits and drawbacks. Apple had certain goals for their Firestorm cores, first among which seemed to be maximizing perf/W, because its primary applications were phones, tablets, and laptops. While the M1 Ultra isn't a laptop CPU, it's also not a high-volume part compared with the rest of the lot. Furthermore, we don't know if it was a factor in the design of the Firestorm, or if Apple's decision to make the Ultra came only after they had enough experience to believe building such a product made sense.

          So, if SMT had little benefit in low core-count applications and didn't pull its weight on the perf/W front, that would be reasonable grounds for them not to use it.

          Unfortunately, we can only speculate. We don't know the real reason(s) they haven't used it. Furthermore, you can't divorce the decision from the context. And the context here is a phone-oriented core - yes, one which they had bigger ambitions for, but not at the expense of its initial/primary application.
          Except its not speculation if you look at the facts on the ground and also understand what specific problem SMT is solving. SMT is solving the problem by not being able to (almost) fully utilize a single core by creating the concept of multiple (currently and typically 2) virtual cores that are multiplexed onto a single real core.

          If you look at the single core benchmarks for the current Apple M1 cores (and since some time has passed they are quite comprehensive) the performance of the Apple M1's completely blow any competition out of the water, they were even competing against desktop SKU's (which is kinda of ridiculous). So what are the possible reasons why the single core performance is so good
          • They are clocked a lot higher. As you pointed out this isn't the case, they are actually clocked lower compared to other CPU's (boost up to 3.2ghz)
          • IPC for a single core is much higher than the competition which is the case.
          Since its clear that their IPC is so high for a single clock (i.e. no hardware concurrency/parallelism) Apple has reached this single core IPC with a combination of the various techniques that have been mentioned before (branch prediction, OOO execution, caching) however none of these techniques are specific to M1's. The only real thing specific to Apple (compared to x86/64) is
          • SoC layout (and hence the improved memory)
          • AArch64 ISA (which provides a lot of pipelining improvements that improves the single core IPC).
          With such high single core IPC SMT does provide no benefit as a tool, saying different techniques have "pros and cons" is just handwaving and dismissing critical details. You mentioned before that mobile devices typically have lower performance because they are trying to be power efficient but such devices do not have such high IPC (thats one of the reasons why they use less power) however the Apple firestorm cores is an exception here since they have ridiculously high single core IPC. This is why classing them as a "mobile SKU" is misleading because they are not like any other mobile SKU in the conventional sense. If you are were talking about some random Qualcomm Android SoC SKU you may have a point.

          So with all of this, assuming that Apple doesn't lower their single IPC core in future architecture which is generally the complete opposite of what CPU designers want to do (unless they somehow manage to increase their clock speeds to ridiculous levels but at least with current material science design of CPU's, good luck conventionally cooling a 6ghz+ CPU). SMT does, by design, give them no almost no benefit. If they want more concurrency (which is what SMT actually provides) they can just add more cores since their single core IPC is already so high. If Apple's M1 cores didn't have such high single core IPC I wouldn't be saying this, but thats not the case.

          This is the last time I am going to respond to this thread because we are going around in circles, but if you want you can bookmark this post because I still stand by my statement which is as long as Apple doesn't completely change their architecture around or use a different ISA I can fairly safely say for the next 5-10 years (at least) they are not going to use SMT even for their high end desktop Mac Pro when it gets released with their CPU's rather than Intel's. And to rub it in even more, if you look at the HPC/supercomputer/server space that uses ARM, none of their systems as far as I am aware have SMT either and in that space performance is the top priority and since all of the programs running in these spaces are massively concurrent (either they run single programs that are multithreaded or they run many programs at once or both) they would definitely use SMT if it was beneficial.

          The only ARM SKU at least that I am aware of that has SMT is Cortex-A65AE, but I am not aware of anyone actually building and then using these SKU's (can someone fill me in here?)
          Last edited by mdedetrich; 03 June 2022, 05:00 AM.

          Comment


          • #65
            Originally posted by mdedetrich View Post

            Except its not speculation if you look at the facts on the ground and also understand what specific problem SMT is solving. SMT is solving the problem by not being able to (almost) fully utilize a single core by creating the concept of multiple (currently and typically 2) virtual cores that are multiplexed onto a single real core.

            If you look at the single core benchmarks for the current Apple M1 cores (and since some time has passed they are quite comprehensive) the performance of the Apple M1's completely blow any competition out of the water, they were even competing against desktop SKU's (which is kinda of ridiculous). So what are the possible reasons why the single core performance is so good
            • They are clocked a lot higher. As you pointed out this isn't the case, they are actually clocked lower compared to other CPU's (boost up to 3.2ghz)
            • IPC for a single core is much higher than the competition which is the case.
            Since its clear that their IPC is so high for a single clock (i.e. no hardware concurrency/parallelism) Apple has reached this single core IPC with a combination of the various techniques that have been mentioned before (branch prediction, OOO execution, caching) however none of these techniques are specific to M1's. The only real thing specific to Apple (compared to x86/64) is
            • SoC layout (and hence the improved memory)
            • AArch64 ISA (which provides a lot of pipelining improvements that improves the single core IPC).
            With such high single core IPC SMT does provide no benefit as a tool, saying different techniques have "pros and cons" is just handwaving and dismissing critical details. You mentioned before that mobile devices typically have lower performance because they are trying to be power efficient but such devices do not have such high IPC (thats one of the reasons why they use less power) however the Apple firestorm cores is an exception here since they have ridiculously high single core IPC. This is why classing them as a "mobile SKU" is misleading because they are not like any other mobile SKU in the conventional sense. If you are were talking about some random Qualcomm Android SoC SKU you may have a point.

            So with all of this, assuming that Apple doesn't lower their single IPC core in future architecture which is generally the complete opposite of what CPU designers want to do (unless they somehow manage to increase their clock speeds to ridiculous levels but at least with current material science design of CPU's, good luck conventionally cooling a 6ghz+ CPU). SMT does, by design, give them no almost no benefit. If they want more concurrency (which is what SMT actually provides) they can just add more cores since their single core IPC is already so high. If Apple's M1 cores didn't have such high single core IPC I wouldn't be saying this, but thats not the case.

            This is the last time I am going to respond to this thread because we are going around in circles, but if you want you can bookmark this post because I still stand by my statement which is as long as Apple doesn't completely change their architecture around or use a different ISA I can fairly safely say for the next 5-10 years (at least) they are not going to use SMT even for their high end desktop Mac Pro when it gets released with their CPU's rather than Intel's. And to rub it in even more, if you look at the HPC/supercomputer/server space that uses ARM, none of their systems as far as I am aware have SMT either and in that space performance is the top priority and since all of the programs running in these spaces are massively concurrent (either they run single programs that are multithreaded or they run many programs at once or both) they would definitely use SMT if it was beneficial.

            The only ARM SKU at least that I am aware of that has SMT is Cortex-A65AE, but I am not aware of anyone actually building and then using these SKU's (can someone fill me in here?)
            According to wikipedia https://en.m.wikipedia.org/wiki/Simu...multithreading :

            Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPU
            and

            However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and increasing throughput of computations per amount of hardware used.
            Context: The "most current cases" here refers to intel and amd's x86 cpus, and "the other cases" refers to some researches.

            I fully agree what wikipedia says.

            SMT is used to hide memory latency.

            If instruction from one of the process stalls, the cpu core could just read and execute one from another process.
            This has nothing to do some IPC, it's just there to hide memory latency.

            I suspect that the reason Apple doesn't use SMT on M1 is because put the memory straight onto the SoC using system-in-a-package design.
            By putting cpu/gpu and ram next to each other, it can significantly reduce latency on the interconnections between the ram module and the cpu package.

            Plus they also use LPDDR4X which performs better than DDR4 while draws less power.

            Also this:


            unusually large 192 KB of L1 instruction cache and 128 KB of L1 data cache
            With reduced memory latency, an unusually large L1 instruction cache combined with effective OOO engine, Apple might archive a pretty good way of hiding memory latency and thus does not need SMT in M1.
            Last edited by NobodyXu; 03 June 2022, 10:28 AM.

            Comment


            • #66
              Originally posted by mdedetrich View Post
              Except its not speculation if you look at the facts on the ground
              Each design is the summation of a multitude of decisions, many interacting with the others. When you see the end product, you can't simply point to one design decision, in isolation, as if it proves a broader point. More data is needed to draw such conclusion. All we can say is that the M1 apparently isn't suffering for lack of SMT. Those are the "facts on the ground". Any broader points are indeed speculation.

              Another "fact on the ground" is that the M1 Max has hit the scaling limit, in the Ultra. There aren't more interconnects for it to scale beyond 2 dies, and there's no support for external DRAM, which is needed for it to be a proper replacement for the Mac Pro. So, we don't know what happens when you try to scale up further. Maybe the Firestorm cores will be at a serious deficit, when latencies further increase due to more cores contending for access to external DRAM.

              Originally posted by mdedetrich View Post
              and also understand what specific problem SMT is solving. SMT is solving the problem by not being able to (almost) fully utilize a single core by creating the concept of multiple (currently and typically 2) virtual cores that are multiplexed onto a single real core.
              That's reducing the broader set of reasons why you have under-utilization:
              • Code with poor ILP
              • Code with high branch-density
              • Code with erratic branch behavior, leading to many mis-predictions
              • Memory latency
              • Front end bottlenecks

              In-package memory and the AArch64 ISA only address the last two points. More sophisticated branch prediction and a larger reorder buffer can chip away at the first two, but there are ultimately limits to what you can achieve.

              At some point, you hit diminishing returns by merely scanning & tracking code for dependencies. The nice thing about SMT is that it scales well. This is exactly why & how GPUs use it. In order to scale up, you need to keep the compute units simple and small, and that wouldn't happen if you integrated big & complex out-of-order machinery.

              As an example, we can look to how poorly Xeon Phi fared against GPUs. Its Silvermont Atom cores were updated to use SMT-4 and AVX-512. I can't say the SMT was insufficient, but it was still far less than what its competitors used. Maybe if there cores weren't out-of-order, they could've put the same area into making them SMT-8 and gotten more mileage out of them. Or just added even more of them.

              Originally posted by mdedetrich View Post
              If you look at the single core benchmarks for the current Apple M1 cores (and since some time has passed they are quite comprehensive) the performance of the Apple M1's completely blow any competition out of the water,
              But it's also competing on an uneven playing field, if you're interested in a true comparison of the Firestorm's micro architecture. For that, you'd want both CPUs to have similar memory configurations and fabrication on more similar process nodes.

              Also, you don't know how efficient its pipeline utilization is. Maybe there's untapped potential, in some of those benchmarks, that SMT could unlock. Without this knowledge, you can't say SMT wouldn't be a further asset to the core. If you're trying to reach some broader conclusions about SMT, you can't do it without that data.

              Final point, here is that your data is obsolete. Alder Lake performs well against the M1 Max, even while being on an inferior process node and having external DRAM. So does Zen3, for that matter, but less so & not in floating point.



              Originally posted by mdedetrich View Post
              The only real thing specific to Apple (compared to x86/64) is
              • SoC layout (and hence the improved memory)
              • AArch64 ISA (which provides a lot of pipelining improvements that improves the single core IPC).
              That's not a remotely complete list of everything the M1's Firestorm cores do, nor does in include the relative sizes of structures like the reorder buffer.

              Originally posted by mdedetrich View Post
              You mentioned before that mobile devices typically have lower performance because they are trying to be power efficient but such devices do not have such high IPC
              Those tend to have worse IPC than desktop/server CPUs because they're also generally more cost-constrained than desktop cores. Apple is somewhat of an exception, because it can charge more for its phones than just about anyone else. Sure, there are some expensive phones made by others, but they don't sell in the volume that comparable iPhones do.

              Originally posted by mdedetrich View Post
              the Apple firestorm cores is an exception here since they have ridiculously high single core IPC. This is why classing them as a "mobile SKU" is misleading
              It's not misleading because, like all other mobile-first CPU cores, they need to prioritize perf/W above all else. That's a simple fact. Apple can't afford to do anything with them that benefits performance at the expense of perf/W.

              Originally posted by mdedetrich View Post
              If they want more concurrency (which is what SMT actually provides) they can just add more cores since their single core IPC is already so high.
              Performance doesn't scale linearly with core count. The more cores you have, the higher your latencies become, and the ability of the reorder buffer to hide them will be exceeded. And unlike ARM's Neoverse cores, Apple's cores are pretty big. So, it can't just compensate by adding more cores than anyone else, such as we saw in Ampere's Altra CPUs.

              Originally posted by mdedetrich View Post
              This is the last time I am going to respond to this thread because we are going around in circles,
              I agree that it's pointless to continue, if you're resistant to taking onboard new information.

              Originally posted by mdedetrich View Post
              if you want you can bookmark this post because I still stand by my statement which is as long as Apple doesn't completely change their architecture around or use a different ISA I can fairly safely say for the next 5-10 years (at least) they are not going to use SMT
              It just makes me feel sad to see someone be so self-assured on the basis of so little information.

              If you'd at least run some performance analysis of the actual vs. theoretical throughput Apple's cores, then you could actually say something about whether they indeed have untapped potential. Even then, it would still be a statement about that specific implementation, but it'd be better-informed that what you've so far used as evidence. As it stands, all you have is a system-level performance comparison which includes many other variables. It cannot be taken as an absolute statement about SMT, or even SMT as applied to AArch64 CPUs.

              Originally posted by mdedetrich View Post
              if you look at the HPC/supercomputer/server space that uses ARM, none of their systems as far as I am aware have SMT either
              Are there any examples besides the A64FX? Being a green-oriented, government project they appear to have focused on optimizing perf/W and just shoveled boatloads of money into scaling it up large enough to reach the top of the list.

              Comment


              • #67
                Originally posted by NobodyXu View Post
                Context: The "most current cases" here refers to intel and amd's x86 cpus, and "the other cases" refers to some researches.
                As pointed out earlier in the thread, there are other production implementations of SMT (several of them are even RISC!), some current:
                • IBM POWER 5+ (2-way)
                • IBM POWER 8+ (8-way)
                • Sun UltraSPARC T-series (4-way)
                • DEC Alpha EV8 (4-way)
                • Intel Itanium 9300+ (2-way)
                • MIPS I6400 & P6600 (2-way)
                • PEZY-SC2 (8-way) - there's another HPC example for you, mdedetrich - and this core was designed for HPC from the ground-up!
                • AFAIK, all current GPUs

                That covers basically all major ISA families for the past 2 decades, aside from RISC-V (which, probably not coincidentally, has also focused on low-power). Even ARM is included, if you include ARM-compatible server cores not designed by them.

                Originally posted by NobodyXu View Post
                SMT is used to hide memory latency.
                That's only one of the benefits it can provide.

                Comment


                • #68
                  NobodyXu

                  In your quote from wikipedia you missed the second part of that sentence (bold for my emphasis)

                  However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and increasing throughput of computations per amount of hardware used.
                  This is referring exactly to my point which is that making sure a CPU is constantly executing instructions rather than sitting and waiting.

                  And yes hiding memory latency is another reason why SMT is used but I also mentioned before why that is less of an issue with Apple's M1 and its also something that Apple is likely not going to change.

                  Comment

                  Working...
                  X