Announcement

Collapse
No announcement yet.

Ampere Altra Max Continues To Deliver Competitive Power Efficiency To AMD EPYC & Intel Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by mdedetrich View Post
    ARM due to both its ISA and chip design doesn't have this problem to the same degree which is why SMT is an extreme exception on ARM designs. Its not worth the sillicon space/power draw and the drawbacks of not having SMT can be alleviated by other means (as shown by Apple with the A/M series chips).
    I wouldn't expect the upside for ARM or RISC-V to be zero, but indeed less than x86. SMT can help hide stalls due to L1/L2 cache misses, for instance. This is the main reason GPUs use it. It can also improve core utilization in branchy and low-ILP code.

    TBH, had the flurry of SMT-related side-channel exploits not been discovered over the past few years, I wouldn't be at all surprised if ARM would've embraced SMT, by now. I think the main reason ARM and Apple hadn't done it before then is because they were optimizing for energy-efficiency and it truly doesn't help much, in that department.

    Comment


    • #52
      Originally posted by qarium View Post
      it looks like the people who buy such systems they just buy it because they do automating of boring non performance demanding stuff.
      not everythin in the computer tech/computer science field is performance demanding and time critical.
      i see it as fact that big part of the market is these boring non performance demanding stuff.
      and the market for the interesting stuff what is very performance demanding is very small in comparison.

      these "boring stuff" peopel see is as a good deal if they have 1/3 of the power consumtion for 2/3 the performance.

      i had similar talk about low end raspberry pi boards they just want a 64bit SOC as slow and cheap as possible and they only go with 64bit stack instead of 32bit stack because their software stack does no longer run at all on 32bit SOC but everything else is boring non demanding stuff means if somene sells a even slower 64bit SOC who is even cheaper these people go and buy it.

      plain and simple because they do not need performance.

      you have to unterstand these Cloud Native systems with 128cores they do not need performance they need many cores because then they can run many boring non demanding virtual machines on this system any every virtual machine they can assign a true core to it makes them money not from performance instead only because this VM instance does some boring service for a customer who absolutly do not need performance.

      so please show some respect for these people they just do their job ... try to sell them chips who are faster but more expensive or faster but more power hungry makes no sense at all.
      Long winded way of saying they can't catch up to x86 performance and need to find excuses to justify their inability to do it.

      Comment


      • #53
        Originally posted by Weasel View Post
        Long winded way of saying they can't catch up to x86 performance and need to find excuses to justify their inability to do it.
        i can not see this. they just don't even try to. they are already successfull in the market.
        you see this in the amd 128core Zen4c cpu with disabled hyperthreating
        they are so successfull that companies like AMD need to copy their feature set.
        what you dont realize is the fact that the market is very differentiated.
        this means you can be successfull outside of the best general purpose solution.
        for that you just need to cover a niche.

        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • #54
          Originally posted by Weasel View Post
          Long winded way of saying they can't catch up to x86 performance and need to find excuses to justify their inability to do it.
          I think it's interesting how quick you are to declare victory. I'd like to see a 72-core Nvidia Grace CPU compared against 64-core EPYC Genoa. Both use TSMC N5, for their compute dies, and should therefore be comparable. Nvidia has even announced it's contemplating selling generic servers with Grace CPUs, meaning we might get the chance!

          Comment


          • #55
            Originally posted by coder View Post
            I think it's interesting how quick you are to declare victory. I'd like to see a 72-core Nvidia Grace CPU compared against 64-core EPYC Genoa. Both use TSMC N5, for their compute dies, and should therefore be comparable. Nvidia has even announced it's contemplating selling generic servers with Grace CPUs, meaning we might get the chance!
            Yeah it would be interesting. I'm legit sick and tired of ARM's excuse always being "but but we optimized for power efficiency...". Surely they could have at least one optimized for pure performance eh?

            Never said they all should focus on perf, but not even 1? Fishy af.

            Comment


            • #56
              Originally posted by Weasel View Post
              Yeah it would be interesting. I'm legit sick and tired of ARM's excuse always being "but but we optimized for power efficiency...". Surely they could have at least one optimized for pure performance eh?
              The Cortex X series cores are supposed to be performance-optimized. The Neoverse V series cores are derived from the X cores. Both Nvidia's Grace and Amazon's Graviton 4 are based on Neoverse-V2. One point of interest is that it's an ARMv9-A core, meaning it has SVE2 (Scalable Vector Extensions 2).

              The Cortex-X2's 288-entry ROB is still markedly smaller than Golden Cove's 512-entry equivalent, however. That's just one metric, but does suggest they're not quite on par. Not only that, but benchmarks of phone SoCs featuring the X2 have shown it still hasn't caught up to Apple.

              Here's how it compares with Zen 4, in a few respects:

              Structure Cortex X2 Zen 4
              Reorder Buffer 288 320
              Integer Register File ~213 224
              FP/Vector Register File ~156x 128-bit 192x 512-bit
              Flags Register File 70 108 documented
              238 measured
              Load Queue 174 88 documented
              136 measured
              Store Queue 72 64
              Branch Order Buffer 68 118
              Operation Cortex X2 Zen 4
              FP32 Add 2.53 per cycle
              2 cycle latency
              2 per cycle
              3 cycle latency
              FP fused multiply-add 2.53 per cycle
              4 cycle latency
              2 per cycle
              4 cycle latency
              128-bit vector INT32 add 2.53 per cycle
              2 cycle latency
              4 per cycle
              1 cycle latency
              128-bit vector INT32 multiply 1.26 per cycle
              4 cycle latency
              2 per cycle
              3 cycle latency

              For expanded versions of those tables + more analysis, see here (including microbenchmarks):

              Arm has traditionally targeted the low end of the power and performance curve, but just as Intel has been looking to expand into the low power market, ARM is looking to expand into higher power and…


              They also discussed ARM's Hot Chips presentation of the Neoverse V2:

              Arm has a long history in making low power CPUs, but have been trying to expand their reach into higher power and higher performance segments. At Hot Chips 2023, Arm presented the Neoverse V2, the …

              Comment


              • #57
                Originally posted by Weasel View Post
                Yeah it would be interesting. I'm legit sick and tired of ARM's excuse always being "but but we optimized for power efficiency...". Surely they could have at least one optimized for pure performance eh?
                Never said they all should focus on perf, but not even 1? Fishy af.
                if apple and also qualcomm do high performance ARM cores then why do ARM also need high performance cores ?

                does it even matter who exactly does the high performance cores ?

                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #58
                  Originally posted by coder View Post
                  I wouldn't expect the upside for ARM or RISC-V to be zero, but indeed less than x86. SMT can help hide stalls due to L1/L2 cache misses, for instance. This is the main reason GPUs use it. It can also improve core utilization in branchy and low-ILP code.

                  TBH, had the flurry of SMT-related side-channel exploits not been discovered over the past few years, I wouldn't be at all surprised if ARM would've embraced SMT, by now. I think the main reason ARM and Apple hadn't done it before then is because they were optimizing for energy-efficiency and it truly doesn't help much, in that department.
                  I read somewhere that due to the design of the ARM ISA, it has a lot more information about potential branching which means it doesn't need to rely on SMT as much as x86 needs to (hence the reduced amount of L1/L2 cache misses). I am personally not that familiar with the ARM ISA but there was an in depth article on this somewhere.

                  Obviously SMT would provide some benefit to ARM but compared to its cost (especially given that ARM optimizes for power efficiency) as you pointed out its not worth it (even aside from the side channel vulnerabilities), As far as I understand, the Apple A/M series of chips basically mitigated the downsides of not having by SMT by bruteforcing L1/L2 cache size and latency which if true is kind of ingenious in its simplicity.

                  Comment


                  • #59
                    Originally posted by Weasel View Post
                    Yeah it would be interesting. I'm legit sick and tired of ARM's excuse always being "but but we optimized for power efficiency...". Surely they could have at least one optimized for pure performance eh?

                    Never said they all should focus on perf, but not even 1? Fishy af.
                    They do, I mean most arms have big little design, i.e. P cores on the Apple A/M series, the single core speeds of those P cores are insane. Now you could ask "well why don't we have an entire system filled with P cores" and the answer is we have basically hit a point where in general you can't get away with ignoring power efficiency anymore (and intel is learning this the hard way). Cores have become so dense that for the past few years we are actually hitting issues regarding power/cooling (I mean we are getting 300W consumer CPU's from Intel now, a decade ago getting above 150W would have been a task unless you were doing extreme OC).

                    And the thing with efficiency cores is that they are insanely efficient if you don't care about performance and you can even extend this server use cases. i.e. have that massive batch job where you don't care that much about how long it takes (even if its twice as long) but you don't want it chugging on your power/thermal budget? Well thats a case for your E cores. Not going to go into laptops because the usecase for E-cores should frankly be obvious here.

                    But yes as mentioned elsewhere, I do expect systems to be released which will have just "P" cores, wouldn't be surprised if even Apple made such a workstation/server system in the next decade but given we already have competition here they may not even bother just because they won't be able to make that much money out of it.

                    Comment


                    • #60
                      Originally posted by mdedetrich View Post
                      But yes as mentioned elsewhere, I do expect systems to be released which will have just "P" cores, wouldn't be surprised if even Apple made such a workstation/server system in the next decade but given we already have competition here they may not even bother just because they won't be able to make that much money out of it.
                      but Qualcomm does exactly this with the Qualcomm Snapdragon X Elite ?
                      their newest ARM design is only P Cores.

                      this SOC as 12 P cores and zero little cores.
                      Phantom circuit Sequence Reducer Dyslexia

                      Comment

                      Working...
                      X