Announcement

Collapse
No announcement yet.

SMT Proves Worthwhile Option For 128-Core AMD EPYC "Bergamo" CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SMT Proves Worthwhile Option For 128-Core AMD EPYC "Bergamo" CPUs

    Phoronix: SMT Proves Worthwhile Option For 128-Core AMD EPYC "Bergamo" CPUs

    While the AMD EPYC 9754 "Bergamo" processor is impressive for having 128 physical Zen 4C cores, it also has Simultaneous Multi-Threading (SMT) to provide for 256 threads per socket. Meanwhile with Ampere Altra Max and AmpereOne there is no SMT and it's likely Intel's upcoming Sierra Forest will also lack SMT (Hyper Threading) given it's an E-core-only design. But that led to my curiosity over the SMT impact for Bergamo on power and performance when leveraging SMT for the 128-core flagship EPYC 9754. Today's Bergamo benchmarking is looking at SMT on and off for both 1P and 2P server configurations.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    It would be nicer to get full scaling curves (i.e. for 1, 2, 3, ... , 255, 256 threads) -- some cases where HT is not helpful might be because the code is already saturated with the processing power (and limited by memory throughput, lock congestion, scalar part, etc.) and cannot make use of additional threads. The other question is whether HT degrades gracefully, namely whether 128 threads work equally well with and without HT.

    Comment


    • #3
      Originally posted by mb_q View Post
      It would be nicer to get full scaling curves (i.e. for 1, 2, 3, ... , 255, 256 threads) -- some cases where HT is not helpful might be because the code is already saturated with the processing power (and limited by memory throughput, lock congestion, scalar part, etc.) and cannot make use of additional threads. The other question is whether HT degrades gracefully, namely whether 128 threads work equally well with and without HT.
      We're talking a bare minimum of a 128 thread workload here. From what I recall, all tested applications except one seemed to yield a performance improvement. That suggests everything except that one application isn't saturated.
      Since most applications these days appear to be infinitely scalable, you really only need minor incremental tests. So for example, tests for 4, 8, 16, 64, 128, 256, and 512 threads would be sufficient. You can extrapolate the in-between performance based on the values of each group.

      Comment


      • #4
        Does smt=off stats apply to scheduling on physical cores only?
        In that case, it could be a matter of application tuning?

        Comment


        • #5
          Typo:

          Originally posted by phoronix View Post
          showed uplift from having Bergamo with Simulanteous Multi-Threading enabled.

          Comment


          • #6
            Originally posted by tildearrow View Post
            Typo:
            Thanks
            Michael Larabel
            https://www.michaellarabel.com/

            Comment


            • #7
              Darn fascinating Michael. It is interesting that in most cases it doesn't double performance like I had been led to believe with SMT, it is maybe 33% faster. I wish I could afford one of these. Gentoo or FreeBSD with all ports and mach=native would be interesting and compiling packages would be as easy as installing binary packages! It will be a cool day in the future when we get processors of this caliber in the home in say 15 years!

              Comment


              • #8
                for me it looks bad for hyperthreading but most people will not see it
                many benchmarks who run faster von CPUs with hyperthreading become obsolete because GPU outgun the CPUs in these benchmarks like Blender... all the creators now use GPUs to render the Blender raytracing scenes.
                so what is the rational for having a cpu with hyperthreading if the GPU is faster anyway ?

                yes many people will disagree but i will make more examples:

                Games are not tested in this article but the lastest benchmarks of 6cores with 3D cache vs 8cores with 3D cache shows games have little use of more cores or more threats and if your cpu has native 8cores there is no need for hyperthreading
                also the games do more FPS with hyperthreading but this is fake performance as soon as you put in keyboard and mouse input the latency of the inputs go down by 5% as soon as you disable hyperthreading this means for any serious gamer who is not complete stupid and only watch FPS numbers will disable hyperthreading for games.

                also the cost of RAM and the Memory Wall all the big AI workloads suffer from Memory Wall punishment means performance drop
                and hyperthreading makes this problem bigger because you need 2GB of ram per thread this means for 256 threads you need 512GB RAM... if you disable hyperthreading the same system only need 256GB RAM this is a significant cost factor and maybe the true reason why cloud server farms buy ARM cpus without hyperthreading or disable hyperthreading in AMD CPUs

                also the relevant AI workloads like TensorFlow show that they run the same or faster with disabled hyperthreading.

                and again and remember Memory Wall is a big problem in the AI world even if you buy a GPU with 48GB vram you will hit the memory wall these big CUDA clusters use GPUs with 80GB and more.

                this means an Apple M2 Ultra APU with 192GB its a APU means cpu and GPU has the same RAM means the GPU will not HIT the memory wall... with 192GB of ram. if you do very very big AI modells even in time of all the 48GB workstation AMD GPUS and even all the big iron 80GB CUDA cards from Nvidia are obsolete because of the memory wall.

                an Apple M2 Ultra APU will stay longer relevant and this CPU of course has no Hyperthreading but no one cares because they put the AI workload on the GPU anyway.

                again cost of RAM in a different meaning all the APU systems beat traditional systems at the cost of RAM because CPU and GPU only need 1 ram without need the same amount of ram for cpu and again the same amount of ram for the GPU.

                for me thats the reason hyperthreading will die in the near future people will put the relevant workload on the GPU instead of CPU and they will avoid cost of ram by using APUs and they will avoid cost of ram and memory wall by simple do not allow a technology who waste your ram like hyperthreading.

                I can easily predict this: Apple M3 Ultra will have 256GB of ram (no memory wall) they will hit 4ghz (from the 3.7ghz m2 ultra) the ram speed will go from 800gb/s to 1500GB/s (SODDR5 MRDIMMs will be used with 17,600 MT/s)
                ​) the cpu core count goes from 24 to 32 or more cores the GPU will have AV1 decode and encode and the GPU will add raytracing hardware acceleration and the shader cores of the gpu will be doubled.
                all that with lower power consumtion on TSMC 3nm ...

                with specs like this its pretty sure that anything from intel and AMD will be obsolete with that and of course all without hyperthreading.
                Last edited by qarium; 21 July 2023, 04:11 PM.
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #9
                  There wasn't a GPU thread today but I thought y'all might like some throwback anyway. I got a "new to me" D FA 100MM WR Macro today.




                  Sorry for the harsh, cell phone LED lighting. I need to invest in a ring light next.

                  Comment


                  • #10
                    Originally posted by qarium View Post
                    for me it looks bad for hyperthreading but most people will not see it
                    many benchmarks who run faster von CPUs with hyperthreading become obsolete because GPU outgun the CPUs in these benchmarks like Blender... all the creators now use GPUs to render the Blender raytracing scenes.
                    so what is the rational for having a cpu with hyperthreading if the GPU is faster anyway ?

                    yes many people will disagree but i will make more examples:

                    Games are not testes in this article but the lastest benchmarks of 6cores with 3D cache vs 8cores with 3D cache shoes games have little use of more cores or more threats and if your cpu has native 8cores there is no need for hyperthreading
                    also the games do more FPS with hyperthreading but this is fake performance as soon as you put in keyboard and mouse input the latency of the inputs go down by 5% as soon as you disable hyperthreading this means for any serious gamer who is not complete stupid and only watch FPS numbers will disable hyperthreading for games.

                    also the cost of RAM and the Memory Wall all the big AI workloads suffer from Memory Wall punishment means performance drop
                    and hyperthreading makes this problem bigger because you need 2GB of ram per thread this means for 256 threads you need 512GB RAM... if you disable hyperthreading the same system only need 256GB RAM this is a significant cost factor and maybe the true reason why cloud server farms buy ARM cpus without hyperthreading or disable hyperthreading in AMD CPUs

                    also the relevant AI workloads like TensorFlow show that they run the same or faster with disabled hyperthreading.

                    and again and remember Memory Wall is a big problem in the AI world even if you buy a GPU with 48GB vram you will hit the memory wall these big CUDA clusters use GPUs with 80GB and more.

                    this means an Apple M2 Ultra APU with 192GB its a APU means cpu and GPU has the same RAM means the GPU will not HIT the memory wall... with 192GB of ram youc an do very very big AI modells even in time of all the 48GB workstation AMD GPUS and even all the big iron 80GB CUDA cards from Nvidia are obsolete because of the memory wall.

                    an Apple M2 Ultra APU will stay longer relevant and this CPU of course has no Hyperthreading but no one cares because they put the AI workload on the GPU anyway.

                    again cost of RAM in a different meaning all the APU systems beat traditional systems at the cost of RAM because CPU and GPU only need 1 ram without need the same amount of ram for cpu and again the same amount of ram for the GPU.

                    for me thats the reason hyperthreading will die in the near future people will put the relevant workload on the GPU instead of CPU and they will avoid cost of ram by using APUs and they will avoid cost of ram and memory wall by simple do not allow a technology who waste your ram like hyperthreading.

                    I can easily predict this: Apple M3 Ultra will have 256GB of ram (no memory wall) they will hit 4ghz (from the 3.7ghz m2 ultra) the ram speed will go from 800gb/s to 1500GB/s (SODDR5 MRDIMMs will be used with 17,600 MT/s)
                    ​) the cpu core count goes from 24 to 32 or more cores the GPU will have AV1 decode and encode and the GPU will add raytracing hardware acceleration and the shader cores of the gpu will be doubled.
                    all that with lower power consumtion on TSMC 3nm ...

                    with specs like this its pretty sure that anything from intel and AMD will be obsolete with that and of course all without hyperthreading.
                    But did you factor in the use of PS/2 mouse and PS/2 keyboard because that's how i roll without that slow usb

                    Comment

                    Working...
                    X