Announcement

Collapse
No announcement yet.

AMD Ryzen Threadripper PRO 5965WX Memory Scaling Benchmarks On Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by edwaleni View Post
    Can someone fill me in on why a result would regress going from 6 to 8 channels, but improve going from 4 to 6 channels?

    Why are some tests unable to scale? Does the test complete before all the channels are brought up for use?
    Some workloads don't need that much bandwith unless you could also process the data faster. And having more dimms increases latency from switching between them. They may also stress the mem controller and therefore cut in the power budget that the CPU has.

    Often more dimms also lead the mem controller to reduce frequency but that wasn't the case here as Michael said.

    Comment


    • #12
      Originally posted by Anux View Post
      Often more dimms also lead the mem controller to reduce frequency but that wasn't the case here as Michael said.
      Indeed, if you check the specs for the 5700Xyou will see that as soon as you populate more than two slots it drops the frequency by quite a bit. But it only has two memory channels, whereas the 5965WX has one for each DIMM slot so it shouldn't have any problem running them all at 3200 MHz.

      Note that while the 5700X is officially specced to do 3200 MHz, in reality it will happily do 3600 MHz with 3600 sticks. I wonder if the 5965WX is the same, and if that would reduce or amplify the performance delta between 6 and 8 sticks in the cases where 6 beats 8. (No I do not expect Michael to go out and buy another 8 sticks of RAM.)

      ((DDR4 prices are coming down nicely though, too bad almost everything is infected with RGB nonsense.))

      Comment


      • #13
        Originally posted by Gonk View Post

        Indeed, if you check the specs for the 5700Xyou will see that as soon as you populate more than two slots it drops the frequency by quite a bit. But it only has two memory channels, whereas the 5965WX has one for each DIMM slot so it shouldn't have any problem running them all at 3200 MHz.
        That again is dependent on the power budget or temp limit of the memory controller. If you follow the overclocking scene they are exclusivly using only 2 dimms on dual channel plattforms as that gives highest possible frequencys and lowest latency. The non overclocking standards are very conservative and have sometimes even different limits for single and dual ranked modules. Marketing of course only tells you the fastest limit wich is dual channel single rank. ^^

        Note that while the 5700X is officially specced to do 3200 MHz, in reality it will happily do 3600 MHz with 3600 sticks. I wonder if the 5965WX is the same, and if that would reduce or amplify the performance delta between 6 and 8 sticks in the cases where 6 beats 8. (No I do not expect Michael to go out and buy another 8 sticks of RAM.)
        I also don't know if Threadripper allows/denies memory overclocking or if it is something that only depends on the board/bios.

        Comment


        • #14
          Originally posted by edwaleni View Post
          Can someone fill me in on why a result would regress going from 6 to 8 channels, but improve going from 4 to 6 channels?

          Why are some tests unable to scale? Does the test complete before all the channels are brought up for use?

          Does this reflect a power management activity or is it simply a memory switching activity?
          XMP profiles only include primary timings, and sometimes a couple secondary timings. The rest of the secondary and tertiary timings are to be decided by the motherboard and the memory controller.
          It's in the motherboard's interest to choose conservative and loose timings so memory training has the best chance to succeed, as if memory training fails then you aren't booting.
          What could be seen here is the memory controller being pushed towards the limits and it isn't capable of running tighter timings.

          I know for my 5900X it has a frequency and timing wall with 128GB of RAM. I can't go above 3600MT/s, below a CAS of 14 cycles, and some timings need to be relaxed. If I remove two sticks and run 64GB I can run 4000MT/s.

          Comment


          • #15
            Originally posted by Namelesswonder View Post

            XMP profiles only include primary timings, and sometimes a couple secondary timings. The rest of the secondary and tertiary timings are to be decided by the motherboard and the memory controller.
            It's in the motherboard's interest to choose conservative and loose timings so memory training has the best chance to succeed, as if memory training fails then you aren't booting.
            What could be seen here is the memory controller being pushed towards the limits and it isn't capable of running tighter timings.

            I know for my 5900X it has a frequency and timing wall with 128GB of RAM. I can't go above 3600MT/s, below a CAS of 14 cycles, and some timings need to be relaxed. If I remove two sticks and run 64GB I can run 4000MT/s.
            Someone else told me it has something to do with the switching fabric for the memory controller. That would line up with what you just shared. That the fabric has a limit depending on the number of dimm's involved. I wonder if it is by design (for data integrity), or something in bin sort that differentiates the behavior. Would be interesting to see if another stepping comes out later or a new AGESA is released to deal with it depending on if it's a feature or an unplanned behavior.

            Comment


            • #16
              Originally posted by Anux View Post
              And having more dimms increases latency from switching between them.
              That should only apply when you have multiple DIMMs per channel, if ever. Further, it might not apply to registered memory, like what this or servers use.

              Originally posted by Anux View Post
              They may also stress the mem controller and therefore cut in the power budget that the CPU has.
              This typically doesn't apply to registered memory.

              Originally posted by Anux View Post
              Often more dimms also lead the mem controller to reduce frequency but that wasn't the case here as Michael said.
              Again, you're thinking of unbuffered (i.e. non-registered) memory.

              Comment


              • #17
                Originally posted by Gonk View Post
                Indeed, if you check the specs for the 5700Xyou will see that as soon as you populate more than two slots it drops the frequency by quite a bit.
                Yup, because it supports only unbuffered memory. That's the norm for unbuffered memory, actually for a while.

                In the case of Alder Lake, simply using a board with 4 DIMM slots reduces your maximum DDR5 frequency, even when the second slot on each channel is left empty! Unfortunately, most decent Alder Lake boards seem to have 4 DIMM slots. It's only like mini-ITX and a few extreme overclocking boards that have just 2 slots.

                Originally posted by Gonk View Post
                I wonder if the 5965WX is the same,
                Servers and workstations typically don't let you do any form of overclocking. Intel has started to loosen the reigns on this, in its workstation line, so it's possible AMD allows some measure of OC with TR. The motherboard docs should state what's allowed.

                Originally posted by Gonk View Post
                ((DDR4 prices are coming down nicely though, too bad almost everything is infected with RGB nonsense.))
                Not server memory.

                Kingston is one of my preferred memory suppliers and their ValueRam line is no-frills.



                Looks to me like Crucial (my second supplier) also offers plenty of options without RGB.
                Last edited by coder; 17 August 2022, 10:09 PM.

                Comment


                • #18
                  Originally posted by Anux View Post
                  The non overclocking standards are very conservative and have sometimes even different limits for single and dual ranked modules. Marketing of course only tells you the fastest limit wich is dual channel single rank. ^^
                  That's no longer true. It turns out that dual-ranked DDR5 DIMMs offer better performance with Alder Lake. But it's not equivalent to having 2x single-ranked DIMMs per channel - that's still a losing proposition.



                  According to the reviewer:

                  "... in a dual rank DIMM, Rank Interleaving can be employed, which allows the second rank of memory chips to be ready for immediate access. While the differences are minimal even on a theoretical basis, as we have seen they are not zero: rank interleaving reduces response times in the pipeline refresh cycles, which can mean more performance in latency-sensitive applications, or when an application is going to be able to push DDR5 to its overall bandwidth limits."

                  Note: the same might not apply to DDR4. I've traditionally seen guidance to prefer lower-rank DIMMs.

                  Comment


                  • #19
                    Originally posted by coder View Post
                    That should only apply when you have multiple DIMMs per channel, if ever. Further, it might not apply to registered memory, like what this or servers use.
                    Is there no command rate for RDIMMs?

                    This typically doesn't apply to registered memory.
                    Why wouldn't it? The more the mem controller has to do the hotter it gets, whats so magical about registered dimms that it wouldn't do that?

                    Originally posted by coder View Post
                    That's no longer true. It turns out that dual-ranked DDR5 DIMMs offer better performance with Alder Lake. But it's not equivalent to having 2x single-ranked DIMMs per channel - that's still a losing proposition.
                    That article doesn't compare overclocking capabillitys so I'm not sure what I should see there? It just shows more DIMMs are slower and DR is faster than SR at same clocks which is the case for DDR4 also and I never said otherwise.

                    Note: the same might not apply to DDR4. I've traditionally seen guidance to prefer lower-rank DIMMs.
                    For overclocking? Yes that was my point from the beginning when I said: "overclockers use 2 DIMMs single ranked". Single ranked modules have just more clocking potential independent of mem controller.

                    That all might change with DDR5 but your linked article suggest otherwise.

                    Comment


                    • #20
                      Originally posted by Anux View Post
                      Is there no command rate for RDIMMs?
                      At one DIMM per channel, the commands would all be on different channels.

                      Originally posted by Anux View Post
                      Why wouldn't it? The more the mem controller has to do the hotter it gets, whats so magical about registered dimms that it wouldn't do that?
                      The whole point of registered memory is to add an electrical buffer, to relieve the load on the memory controller. The downside is this usually adds about a cycle worth of latency and adds a tiny bit of cost per DIMM - that's why it's not done for consumer platforms. Therefore, the memory controller it's doing much less work per DIMM than with unbuffered.

                      Comment

                      Working...
                      X