Announcement

Collapse
No announcement yet.

AMD Ryzen 9 7900X Performance With ECC DDR5 Memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Veto View Post
    I have often wondered, if I should begin to use ECC RAM for my NAS/server running 24/7. However, I have not really realized any issues being due to RAM errors.

    Does anyone have any experience with running ECC RAM? Do you get errors/corrections reported in your logs regularly or at all? Is it really necessary in real life?
    I do use ECC Ram in my server (Build blog here https://www.solsys.org/dyntbl.php?mo...3&op=View&id=9 ) and didn't have any errors show up yet.
    My old NAS had to be retired as it started showing erratic behavior with sometimes data loss. These problems do not exist anymore since I use that setup.
    Linuxer since the early beginnings...

    Comment


    • #32
      Originally posted by peterdk View Post
      Are there any consumer motherboards that support ECC on AMD 7000 series?
      Originally posted by wertigon View Post
      Asus got support across pretty much the whole range, do double check but seems like most Prime and ROG boards support it, and possibly more.
      Originally posted by gary7 View Post
      Most ASRock AM4 mother boards support ECC. A quick check of their AM5 motherboards shows that ECC support is 50/50 or less with few of the low end and even some of the high(est) end motherboards not supporting it.
      Just a quick protip: you can typically find out if a given motherboard supports ECC by just downloading the PDF manual and doing a search for the likes of "error" or "ecc" or "e.c.c" (yes, without the final period to make sure they didn't derp up)​

      Originally posted by coder View Post
      So, why the higher latency spec? That's only because they used conservative timing and adhere strictly to JEDEC specifications.
      One of the fun thing of using consumer boards that support ECC is that you can overclock the RAM and, due to the aforementioned error-correcting functionality, it makes it not just way easier to find stability but also gives you that extra peace of mind that, even if some random teeny bit of instability managed to slip through, it'll get corrected on-the-fly anyway.

      Fun fact: I have some 2x4GB DDR3-1066 Crucial unbuffered ECC that easily runs at 1600 9-9-9-24 even when undervolted some. It's similarly paired with some SuperTalent 2x8GB DDR3-1333 that also runs at 1600 9-9-9-24 when undervolted (one of the sticks even runs at 1866 with a minor undervolt, but the system it's currently on can't properly do 1866 anyway so I have all four DIMMs at 1600 9-9-9-24)

      Oh yeah, undervolting is also a great way to test your RAM's stability. If it's stable at 1.40v, then it sure as heck should be stable at 1.50v with the error-correcting even as a fallback.

      ...that being said, I did find a funny thing where you can have RAM pass memtest86+ but fail OCCT with medium data set + extreme, yet not due to any sort of overclock but rather the combination of DIMMs and/or motherboard just not jiving (I have a 2x8GB DDR3-1600 Kingston kit that just wouldn't jive with a 2x8GB DDR3-1600 Corsair kit, but the Kingston kit jives just fine with a 2x4GB DDR3-1866 Corsair kit, and that same 2x4GB DDR3-1866 Corsair kit jives just fine with the 2x8GB DDR3-1600 Corsair kit on even the same motherboard and CPU, so...)
      Last edited by NM64; 06 October 2023, 03:32 AM.

      Comment


      • #33
        Originally posted by unwind-protect View Post
        In summary, I am surprised that there is a measurable difference from turning ECC on and off. Even 2-3% isn't what I expcted.

        7% slower surprises me. It doesn't match my mental model of how ECC works.
        The problem is you just looked at the outliers. The geomean across all 242 benchmarks showed that enabling ECC still gives you an average of 99.74% of the performance you get with it disabled.




        Without further investigation, we don't know why the outliers are outliers. I wonder how consistent their scores are. It could be that they're just some of the more highly-variable benchmarks included in the suite, and much of the discrepancy we're seeing is due to random variation. Or, maybe it's something else, like that enabling ECC has the effect of disabling burst-chop.

        The bigger performance penalty is having to use lower-spec'd RAM, because ECC UDIMMs lag non-ECC in terms of the speed and timings available. Speaking of which, Kingston provides DDR5-5600 ECC UDIMMs, Michael (the article says only DDR5-5200):

        Trust Kingston for all of your servers, desktops and laptops memory needs. Kingston DRAM is designed to maximize the performance of a specific computer system. Find memory for your device here.



        P.S. thanks for the benchmarks, Michael. I'd love to know more about some of those more extreme outliers (how consistent were the scores, without changing the memory setting?).
        Last edited by coder; 06 October 2023, 03:54 AM.

        Comment


        • #34
          Originally posted by Old Nobody View Post
          All reported errors got corrected. You have to decide on you own if it's worth the extra money.
          A double-bit (detected) error will abort your process, if it happens in userspace. In the kernel, I'd imagine you'll get a kernel panic. It's not nice, but at least it stops you from continuing and possibly propagating corrupt data.

          I experienced this, in practice. We had a server where apps were crashing a lot. I think there were also reports of the machine locking up or hitting kernel panics, but I'm foggy on those details. Upon investigation, syslog was filled with reams of ECC errors, some of them double-bit!
          Last edited by coder; 06 October 2023, 03:33 AM.

          Comment


          • #35
            Originally posted by wertigon View Post
            It is time to make ECC mandatory, there is little reason left not to do it and you can always turn it off if you do not want it.
            Out-of-Band (OoB) ECC adds a >= 25% price premium to DDR5 DIMMs, because you need one extra DRAM chip for every 4, in order to provide 8 bits of ECC for every 32 bits of data. In mass-market products, such a cost is nontrivial.

            Intel has been pioneering a different avenue, for some of their embedded products. They refer to it as in-band ECC, and essentially it works by setting aside a block of the address range to hold the ECC bits used to protect data in the rest of the RAM. This incurs a performance penalty and reduces the available memory capacity, but doesn't add cost and works with any (otherwise-compatible) DIMM.

            Anandtech benchmarked a system with IB ECC enabled & disabled, in order to quantify the performance impact. Unfortunately, because it's Intel, they are playing their usual market segmentation games, rather than making this feature available on their entire range of client & embedded CPUs.

            Comment


            • #36
              Originally posted by NM64 View Post
              Just a quick protip: you can typically find out if a given motherboard supports ECC by just downloading the PDF manual and doing a search for the likes of "error" or "ecc" or "e.c.c" (yes, without the final period to make sure they didn't derp up)​
              100% of examples I've seen that support ECC will say "ECC", somewhere in the specs. No need to search for the other variations.

              Also, check their qualified memory list. If the board provides ECC support, they will most likely include some ECC DIMMs among those they've tested and vouch for.

              Comment


              • #37
                Originally posted by unwind-protect View Post
                That's nice but you get no reporting. For all you know you could have a broken module that is spewing 1-bit errors on a constant basis and next thing you know you get a 2-bit error and wrong data - again without that fact being disclosed to you. In a way this in-module-only ECC functionality is worse than no ECC.
                That's because it isn't some kind of optional error detection but a "needed to operate probably" thing. Like on hard disks where data couldn't be read or written error free without some ECC.

                Comment


                • #38
                  Originally posted by peterdk View Post
                  Are there any consumer motherboards that support ECC on AMD 7000 series? I run a homeserver with a 5950X and Asrock B550M pro4 supports ECC without issues. But apparently the newer 7000 series mobo's do not support it by default?

                  Last time when I have looked ASUS had the most motherboards with AM5 ECC support.

                  This is a change, because in the past ASRock had ECC support on all motherboards, but now many of their MBs no longer mention ECC support.

                  The one that I would choose for a server or workstation is ASUS PRIME X670E-PRO, because it has the best expandability through PCIe slots, allowing for instance to have a GPU and one or two dual-port 10 Gb/s or 25 Gb/s network cards.

                  When less connectivity is needed, there are many other cheaper boards.



                  Comment


                  • #39
                    Originally posted by Veto View Post
                    I have often wondered, if I should begin to use ECC RAM for my NAS/server running 24/7. However, I have not really realized any issues being due to RAM errors.

                    Does anyone have any experience with running ECC RAM? Do you get errors/corrections reported in your logs regularly or at all? Is it really necessary in real life?

                    The most important advantage of ECC is that it detects the aged memory modules.

                    When the DIMMs are new, they may have a couple of errors per year that might not influence anything.

                    After a DIMM has been used 24/7 for some years, many of them will begin to have frequent errors, even multiple errors per day.

                    Without ECC, you will discover this too late, typically by noticing corrupt files that might be irreplaceable.

                    With ECC, you will be notified immediately and you can replace the offending module, so that the server or workstation may continue to be used without problems.



                    Comment


                    • #40
                      Originally posted by LinAGKar View Post

                      It's got on-die ECC as standard, but not ECC all the way to the CPU.
                      The on-die ECC just brings the reliability to the same level as in the older non-ECC memories.

                      It does not offer any of the benefits of ECC memories, like protection against electrical noise or oxidized sockets and detection of memories that become defective after aging.


                      Comment

                      Working...
                      X