Announcement

Collapse
No announcement yet.

NVIDIA GH200 CPU Performance Benchmarks Against EPYC Zen 4 & Xeon Emerald Rapids

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by coder View Post
    Thank you for facilitating this test! It's super interesting to me!

    ...not that I can afford anything like this, but I've been following the evolution of ARM server performance and it's so much better to benchmark Neoverse V2 cores on bare hardware than in Amazon's cloud.
    You are welcome. Data is always good. The more the better. There is much more expected to come. Huge thanks to Michael Larabel. He is doing all the benchmarking work.

    Comment


    • #32
      Originally posted by coder View Post
      Grace isn't really there to do the heavy lifting. It's mainly a support chip for Hopper. It supplies 480 GB of memory, to supplement the 96 GB that's directly attached to the H100.

      More importantly, Grace is designed to be used on SXM boards that can scale up to much larger cache coherent configurations, thanks to NVLink. I'm not sure, but I think you could fit 16 in a single box, which Nvidia then enables you to link together, into a cache coherent cluster. So, they scale way better than anything x86.

      Without testing Grace at scale, you're only seeing one aspect of what it can do.
      256 Nvidia superchips can be connected together coherently.
      Last edited by GPTshop.ai; 11 February 2024, 08:11 AM.

      Comment


      • #33
        Originally posted by coder View Post
        Grace isn't really there to do the heavy lifting. It's mainly a support chip for Hopper. It supplies 480 GB of memory, to supplement the 96 GB that's directly attached to the H100.

        More importantly, Grace is designed to be used on SXM boards that can scale up to much larger cache coherent configurations, thanks to NVLink. I'm not sure, but I think you could fit 16 in a single box, which Nvidia then enables you to link together, into a cache coherent cluster. So, they scale way better than anything x86.

        Without testing Grace at scale, you're only seeing one aspect of what it can do.
        sure. I did not consider supporting H100 in my comment, only considered the better CPU for a compute server ... if I were to decide.

        Comment


        • #34
          Not knowing what to expect the thing that surprised me the most is that the Grace 72 core is almost twice as fast compared to Ampere Altra max 128 core. Even though they have the same TDP and the Ampere has almost twice as many cores. I did not expect that. Ampere should be worried.

          Comment


          • #35
            Originally posted by sophisticles View Post
            If you reread my post regarding the many E-cores statement, you will note that I said it was my feeling that such a processor would result in a smoother, i.e. more responsive experience for end users.
            No it would not. You can easily verify it yourself by comparing some stock modern low core count CPU, like 7600X for instance, with older higher core count CPU using drastically reduced frequency, like say 2990WX clocked at 2GHz or lower. You would get an empirical evidence how ridiculous your assumption is.

            And give me a f***ing break regarding that cringe worthy reddit style PC geek flexing bullshit about "hundreds of open tabs", which is never the case in reality. Even if someone opened that amount of tabs, all of them except the actively used tab would result in sleeping threads and relatively rare random state update request. 7600x would still be more responsive.
            Last edited by drakonas777; 09 February 2024, 05:04 PM.

            Comment


            • #36
              Originally posted by GPTshop.ai View Post
              32 Nvidia superchips can be connected together coherently called NVL32.
              Superchip = 2x Grace = 144 cores? In that case 4608 cores!
              🤯

              Comment


              • #37
                Originally posted by GPTshop.ai View Post
                Ampere should be worried.
                Yes, but probably more having to do with their AmpereOne, which is their newly-launched CPU, with in-house designed cores (Altra uses ARM's Neoverse N1 cores). The fact that they seem to be selling it based on efficiency and not absolute performance isn't not a good sign about the latter.

                Comment


                • #38
                  GPTshop.ai , can you please comment on the storage, which is listed as "960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9" ?

                  A web search quickly indicates the first as a Samsung PM9A3, which is an enterprise-grade PCIe 4.0 NVMe U.2 drive.

                  I was not successful in finding details on the latter. The closest I came was the MZ1L21T9, which is a M.2 version of the PM9A3. Is that right? If so, the M.2 version is 2-sided and I hope cooled adequately, with an appropriate heatsink.

                  I have one of the PM9A3 M.2 drives - they're good but already a couple generations old. As they're 22110 (22x110 mm) it was extremely difficult to find an appropriate heatsink that provided adequate cooling for the entirety of both sides. I ended up sourcing one directly from AliExpress, called "JEYI 22110 SSD Heatsink". It's not great, but with a generous amount of high-quality thermal compound, including where its top & bottom halves meet, it's adequate for my light duty usage.

                  The reason I ask is that I wonder if thermal-throttling by a sub-optimally cooled M.2 drive could hurt performance on any of the benchmarks involving storage, such as compilation and database tests.
                  Last edited by coder; 09 February 2024, 05:29 PM.

                  Comment


                  • #39
                    Originally posted by qarium View Post
                    its not a secret that these universities are a conspiracy against your and everyone elses best interest.
                    and you can only fix this problem by plain and simple never go to a university because as you say its mandatory.
                    So Oxford, MIT, Caltech and every other major university "conspired" to make CUDA programming classes mandatory?


                    And you know this how? Did they call you after their conspiracy meeting was over to inform you pf their plans?

                    And scientists decided to go along with the conspiracy why?

                    Comment


                    • #40
                      Michael, after the Helsing test, you remark:

                      The 72 core Neoverse-V2 configuration paired with HBM memory proved quite capable for a wide array of CPU workloads...

                      The HBM is attached to the Hopper GPU, not the Grace CPU. It's therefore not going to be used by these tests. Grace is just using it's directly-connected LPDDR5.

                      Comment

                      Working...
                      X