Announcement

Collapse
No announcement yet.

Intel 5th Gen Xeon Performance Benchmarks With DDR5-4800 vs. DDR5-5600

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel 5th Gen Xeon Performance Benchmarks With DDR5-4800 vs. DDR5-5600

    Phoronix: Intel 5th Gen Xeon Performance Benchmarks With DDR5-4800 vs. DDR5-5600

    With Intel's just-launched 5th Gen Xeon "Emerald Rapids" processors headlined by the 64-core Xeon Platinum 8592+, one of the key upgrades with these new server processors is now supporting DDR5-5600 memory compared to DDR5-4800 with Sapphire Rapids and also the memory frequency limit with AMD's EPYC Zen 4 processors. Here are some benchmarks of the flagship Xeon Platinum 8592+ when being tested with DDR5-4800 versus DDR5-5600 memory modules.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    That's kind of what I expected before I looked at the results. The extra bandwidth of the 5600 might help integrated graphics, but they weren't tested here. Otherwise, if you have to relax the timings that much to hit the higher bandwidth, it's not worth it.

    Comment


    • #3
      Results are in line with what I've seen in past memory only frequency bandwidth changes. System RAM utilizing programs will benefit some. Cache optimized where the majority of the work is held in on CPU cache levels and storage I/O bound processes won't benefit. No surprises. The expense is only worth it when the majority of your program performance is limited by physical RAM operations when measured over periods of time versus the up front cost of the RAM modules. That's mostly going to be a large deployment calculation rather than single or few seats deployment.

      Comment


      • #4
        Originally posted by DanL View Post
        if you have to relax the timings that much to hit the higher bandwidth, it's not worth it.
        WTF? Do you understand that a clock cycle @ 5.6 GHz is shorter than one at 4.8 GHz? If you normalize their CL by their speed, then it works out to just 1.4% longer latency for the faster modules. That pales in comparison to the 16.7% bandwidth increase.

        What people tend to miss about memory latency is that these are best case numbers. In a heavy, multi-core workload, the memory transaction queues are probably running pretty deep. At that point, more bandwidth helps you a lot more, because it lets you drain those queues faster, resulting in lower typical latency.

        Granted, we do see a handful of regressions, especially in rendering, which tends not to be as bandwidth-limited as you might expect. Overall, it's a net improvement of 2% which is significant, if not huge.

        BTW, I'd expect to see an even bigger speedup on a single-processor config, since dual-CPU workloads will sometimes be bottlenecked by the CPU-CPU interconnect.
        Last edited by coder; 15 December 2023, 03:12 PM.

        Comment


        • #5
          Originally posted by coder View Post
          WTF? Do you understand that a clock cycle @ 5.6 GHz is shorter than one at 4.8 GHz? If you normalize their CL by their speed, then it works out to just 1.4% longer latency for the faster modules. That pales in comparison to the 16.7% bandwidth increase
          Yeah, but if nothing takes advantage of the bandwidth increase, you might as well just have lower latency, especially when you factor price in.

          Comment


          • #6
            Originally posted by DanL View Post
            Yeah, but if nothing takes advantage of the bandwidth increase, you might as well just have lower latency, especially when you factor price in.
            AI eats bandwidth for breakfast. Too bad Michael didn't include any OpenVINO benchmarks, because we can clearly see from the Xeon Max benchmarks that they responded well to HBM (which is not much lower latency, if at all).

            Comment


            • #7
              Originally posted by coder View Post
              AI eats bandwidth for breakfast. Too bad Michael didn't include any OpenVINO benchmarks, because we can clearly see from the Xeon Max benchmarks that they responded well to HBM (which is not much lower latency, if at all).
              but are xeons used for AI? gpus have more faster bandwidth.

              Comment


              • #8
                Originally posted by cj.wijtmans View Post
                but are xeons used for AI? gpus have more faster bandwidth.
                Intel added AMX to Sapphire Rapids for it, and created a variant with HBM, called Xeon Max. See the link in my above post, for more info.

                So, I can't say how many customers actually use AMX or buy Xeon Max for AI workloads, but this is a prime use case that Intel is pushing. Therefore, I think it makes sense to test.

                Also, Michael frequently includes AI workloads in CPU benchmarks. So, they wouldn't be out of place, if a few were included here.

                Comment


                • #9
                  Originally posted by coder View Post
                  Intel added AMX to Sapphire Rapids for it, and created a variant with HBM, called Xeon Max. See the link in my above post, for more info.

                  So, I can't say how many customers actually use AMX or buy Xeon Max for AI workloads, but this is a prime use case that Intel is pushing. Therefore, I think it makes sense to test.

                  Also, Michael frequently includes AI workloads in CPU benchmarks. So, they wouldn't be out of place, if a few were included here.
                  increasing AI peformance will need more than some specialized instruction sets. I am just geussing that GPUs have better matrice multiplication algos and caching and dissecting. but just geussing. Seems a bit strange to implement this on xeons. As far as i know big ai bot nets are all in on nvidia gpu. So i wonder what niche is going to use xeons in this way, perhaps lighter AI work loads? But idk.

                  Comment


                  • #10
                    Originally posted by cj.wijtmans View Post
                    increasing AI peformance will need more than some specialized instruction sets.
                    Then tell Intel and AMD, who added VNNI (both) and AMX (Intel) to their CPUs!

                    Originally posted by cj.wijtmans View Post
                    I am just geussing that GPUs have better matrice multiplication algos and caching and dissecting. but just geussing.
                    Why guess? You can read about Intel GPUs' XMX units, AMD's RDNA3 GPUs' WMMA instructions (and CDNA's Matrix Cores), and Nvidia's famous Tensor "cores". So yes - they all now have hardware support specifically for matrix-multiply acceleration -- something which only Intel has started adding to CPUs.

                    Originally posted by cj.wijtmans View Post
                    Seems a bit strange to implement this on xeons. As far as i know big ai bot nets are all in on nvidia gpu. So i wonder what niche is going to use xeons in this way, perhaps lighter AI work loads? But idk.
                    Intel claims CPU inferencing makes sense for small networks.

                    Anyway, Michael has been testing inferencing performance on CPUs for probably several years, at least. It seems odd that you'd only take exception to it, now.

                    In my opinion, if Intel is marketing its CPUs for AI workloads, it's valid to test their claims. Show us the data, and then we can decide for ourselves whether/when it makes sense to do on CPUs vs. GPUs.

                    Comment

                    Working...
                    X