Announcement

Collapse
No announcement yet.

DDR5 Memory Channel Scaling Performance With AMD EPYC 9004 Series

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DDR5 Memory Channel Scaling Performance With AMD EPYC 9004 Series

    Phoronix: DDR5 Memory Channel Scaling Performance With AMD EPYC 9004 Series

    In addition to the big performance uplift from AVX-512, up to 96 cores per socket, and other Zen 4 architectural improvements, also empowering the EPYC 9004 "Genoa" processors is the support for up to 12 channels of DDR5-4800 memory. In this article is a wide assortment of benchmarks looking at the AMD EPYC 9654 performance across varying numbers of populated DDR5 memory channels.

    https://www.phoronix.com/review/ddr5-epyc-9004-genoa

  • #2
    Whoa. It's like feeding the monsters after midnight.
    (In technical terms: Memory subsystem, even with 12 channels have a hard time keeping them monsters fed.).
    Last edited by milkylainen; 06 January 2023, 09:03 AM.

    Comment


    • #3
      I didn't find what I am looking for via the search feature so I'm asking here. Do you have any comparisons between single-rank and dual-rank DDR5?

      Comment


      • #4
        Originally posted by dkokron View Post
        I didn't find what I am looking for via the search feature so I'm asking here. Do you have any comparisons between single-rank and dual-rank DDR5?
        No, unfortunately. The memory in this server is the only DDR5 server modules I have at the moment and rarely get RAM review samples on their own.
        Michael Larabel
        https://www.michaellarabel.com/

        Comment


        • #5
          Michael, I think there's an error in the units/scale of the "OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU" test. Performance is scaling in the wrong direction, as memory channels are added. Perhaps it's similar to the "Model: Person Detection FP32 - Device: CPU" test, which has units of "ms, Fewer Is Better"?

          Thanks for the tests! Very interesting data!

          Comment


          • #6
            Overall, mostly what I expected. I'm a little surprised some of the deep learning benchmarks weren't more sensitive to memory bandwidth, but I guess those must've used models small enough to fit in L3 cache.

            As for the rendering benchmarks, I wasn't too surprised after having seen:



            For me, the biggest surprise was that compilation wasn't more bandwidth intensive! Could it possibly have been I/O-bottlenecked? The drive appears to be a 800 GB Intel DC P3600 MLC ssd. Read-oriented, probably. ark.intel.com already scrubbed it from their database, and I'm too lazy to search Solidigm's site.

            Here are the specs from Google's cache:
            • Sequential Bandwidth - 100% Read (up to) 2600 MB/s
            • Sequential Bandwidth - 100% Write (up to) 1000 MB/s
            • Random Read (100% Span) 430000 IOPS (4K Blocks)
            • Random Write (100% Span) 50000 IOPS (4K Blocks)

            Michael, I don't suppose this was a SSD that AMD sent you? Or was the platform shipped without storage?

            Comment


            • #7
              Originally posted by dkokron View Post
              I didn't find what I am looking for via the search feature so I'm asking here. Do you have any comparisons between single-rank and dual-rank DDR5?
              For unbuffered, you want dual-rank, but 1 DIMM per channel.



              However, I can't say if that applies to the registered memory used in these servers. For most, there's no choice. DDR5 is too new, so memory capacity requirements will likely drive decisions about which size, type, and number of DIMMs to buy.

              Comment


              • #8
                Originally posted by coder View Post
                Overall, mostly what I expected. I'm a little surprised some of the deep learning benchmarks weren't more sensitive to memory bandwidth, but I guess those must've used models small enough to fit in L3 cache.

                As for the rendering benchmarks, I wasn't too surprised after having seen:



                For me, the biggest surprise was that compilation wasn't more bandwidth intensive! Could it possibly have been I/O-bottlenecked? The drive appears to be a 800 GB Intel DC P3600 MLC ssd. Read-oriented, probably. ark.intel.com already scrubbed it from their database, and I'm too lazy to search Solidigm's site.

                Here are the specs from Google's cache:
                • Sequential Bandwidth - 100% Read (up to) 2600 MB/s
                • Sequential Bandwidth - 100% Write (up to) 1000 MB/s
                • Random Read (100% Span) 430000 IOPS (4K Blocks)
                • Random Write (100% Span) 50000 IOPS (4K Blocks)

                Michael, I don't suppose this was a SSD that AMD sent you? Or was the platform shipped without storage?
                The drive was P5800X used. (AMD didn't supply storage with Titanie besides like some odd WD ~SN750 NVMe SSD for use as a boot drive....)
                Michael Larabel
                https://www.michaellarabel.com/

                Comment


                • #9
                  Originally posted by Michael View Post
                  The drive was P5800X used. (AMD didn't supply storage with Titanie besides like some odd WD ~SN750 NVMe SSD for use as a boot drive....)
                  That's good to hear. The system description at top says: "800GB INTEL SSDPF21Q800GB", which is a 800 GB Intel DC P3600 MLC ssd.

                  Any idea how the error crept in? Was that data collected from a different run?

                  Comment


                  • #10
                    Originally posted by coder View Post
                    For unbuffered, you want dual-rank, but 1 DIMM per channel.



                    However, I can't say if that applies to the registered memory used in these servers. For most, there's no choice. DDR5 is too new, so memory capacity requirements will likely drive decisions about which size, type, and number of DIMMs to buy.
                    I read the anandtech article. The benchmarks aren't the type that I'm interested in. I am looking for HPC benchmarks like the type Michael runs.

                    Comment

                    Working...
                    X