Announcement

Collapse
No announcement yet.

AMD Ryzen 7000 Series EDAC Support Submitted For Linux 6.5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Times Two View Post

    It's an Intel thing because Intel killed ECC on the desktop. In olden times more or less all CPUs supported first parity and then ECC memory. The price difference between ECC and non-ECC SIMMs and DIMMs was on the order of 10% (which more than covers the BOM difference). Then Intel started all their anticompetitive and market manipulation shenanigans.

    Intel began using ECC as a product differentiator, ie only providing it if you were willing to buy a higher priced product and eventually removing any and all ECC support from desktop/home systems. Intel being the dominant supplier of CPUs caused the market for ECC memory to wither away. Without a mass market the price of ECC memory goes up.

    Make no mistake, Intel is one of biggest anticompetitive and anticonsumer scoundrels the world has ever seen. They've been fined billions of dollars for there lawbreaking in most continents. The fines should have been bigger as they did not come close to erasing the advantage and profit that Intel gained by their mafia tactics.
    I know Intel intimately, i really don't care for their tactics.

    But...Intel 'killed' ECC on the desktop 3 decades ago since cost was higher, it was slower, and memory was more reliable than it was in the 80s and their burgeoning partner ecosystem/ODMs etc didn't want that complexity. Anyway there was a performance hit until recent generations of processors. And now that the performance hit is mostly gone from what i can tell and anyway both AMD *and* Intel support ECC on desktop processors now..

    Its not like consumers were pounding down intels door with complaints about bit errors demanding Intel give them parity checking on RAM or they'd have had ECCvPro and ECC@home (which wasn't real ECC but was like fake ecc that would tell you that your memory had an ecc problem but it just means 'wtf who knows lol reboot')

    Comment


    • #12
      Originally posted by Times Two View Post
      Make no mistake, Intel is one of biggest anticompetitive and anticonsumer scoundrels the world has ever seen. They've been fined billions of dollars for there lawbreaking in most continents. The fines should have been bigger as they did not come close to erasing the advantage and profit that Intel gained by their mafia tactics.
      And they still haven't paid their fines to most of all the litigants. Look at the AMD decisions. Their corporate hacks are continuously stringing the court decision out to infinity and have not paid their fines for the most part.

      Comment


      • #13
        Originally posted by AndyChow View Post

        The bit-flip doesn't matter much when you don't even checksum your data.
        You do realize that ECC covers both detection and correction, right?

        The increases you're talking about are infinitesimal. What is a 0.125 increase when memory doubles by 2.0 every few years? The only reason ECC is so much more expensive than normal ram is because of the low volume it's made in, plus the "why the hell not" enterprise hardware tax. Likewise, the speed difference is miniscule compared to the rate at which throughput increases.

        This is the same old argument Bill Gates had for not including memory protection in windows NT, some minuscule performance savings that were overcome within a single hardware generation anyway.

        Comment


        • #14
          Originally posted by panikal View Post

          Why is this an Intel thing? ECC trades a bit of speed and latency for a huge increase in reliability. That trade-off has never, ever made sense or been the requirements of any gamer I've ever know that didn't have other motives like Systems/Development requirements or were just plain hardware nerds. Its about financial accounting and uptime, which most home users don't have either of.

          Edit: Just to be clear I <3 AMD and would like a server at home with ECC RAM that didn't cost 10x what I could pay for a cheapo low end desktop running as a storage box. But why increase the BOM/cost that much when *no one wants or uses it*?
          You do realize AMD has shipped ECC support on every chip they've made for at least a decade, right? Consumer chips included? Intel is the only one charging extra for ECC, just like their wonderful "unlock hardware you already bought" subscription plans.

          Arguing that only financial and "uptime" people need ECC is just dumb. You clearly love to see your computer crash.

          Comment


          • #15
            Originally posted by panikal View Post

            Why is this an Intel thing? ECC trades a bit of speed and latency for a huge increase in reliability. That trade-off has never, ever made sense or been the requirements of any gamer I've ever know that didn't have other motives like Systems/Development requirements or were just plain hardware nerds. Its about financial accounting and uptime, which most home users don't have either of.

            Edit: Just to be clear I <3 AMD and would like a server at home with ECC RAM that didn't cost 10x what I could pay for a cheapo low end desktop running as a storage box. But why increase the BOM/cost that much when *no one wants or uses it*?



            ECC does not trade speed and latency for reliability. ECC trades only cost for reliability.


            The available ECC modules have lower speed and latency than the fastest available non-ECC memories only because the latter trade speed and latency for reliability.

            The ECC modules are available only at the speeds and latencies standardized by JEDEC, which are chosen to provide good reliability, while the memories faster than that are actually overclocked.

            For those who use computers only for games, overclocked CPUs or memories are fine.

            For other purposes, the only reason why the computers vendors have succeeded to follow the market segmentation scam introduced by Intel with the split between Pentium and Pentium Pro has been because software bugs are so frequent that without hardware error logging whenever a computer has a weird behavior or crashes or some data becomes corrupt, the cause cannot be determined, so it is assumed that it was a software bug, which is frequently true, but not always.


            In my experience, ECC has been extremely useful especially for detecting aging in old memory modules, which after many years of use begin to have frequent errors. Detecting the offending memory module allows its replacement in time to avoid the corruption of precious data. I have valuable data whose loss is not acceptable, so for its storage I also use redundancy and checksumming.

            While for games ECC may be omitted, for any kind of work that no longer uses pen and paper, so that all the documents exist only in computers, data corruption is unacceptable and without ECC it is impossible to guarantee data integrity without a huge drop in performance.



            The cost overhead of ECC is many times higher than it needs to be. This happens because in the current standards ECC support is grafted upon a specification for non-ECC memories.

            If ECC would have been a mandatory feature, its overhead could have been made negligible. The reason is that there is no need to have ECC correction for each 64-bit word, but it is enough to have ECC correction for each 512-bit cache line, which can be achieved with a much smaller ratio between redundant bits and data bits. In this case, it would also be possible to not use extra PCB traces with an only slightly lower speed, if the redundant bits would be transferred in an extra clock cycle during a burst transfer.

            So there are means to reduce the cost of ECC, if that would be desired, but for now the memory vendors are happier whenever they can sell DDR5 ECC modules at a 50% higher price than non-ECC modules, even if there is no justification for a more than 25% price difference.




            Last edited by AdrianBc; 27 June 2023, 03:42 AM.

            Comment


            • #16
              If anyone has any kind of cryptographic protection on a host. Or any program that uses a crypto hash, a single bit flip nukes the application AND your data. Let's hope you have a backup or checkpoint for valid data you can revert to. Anybody that depends on data integrity will always choose or should use ECC memory for full path data integrity.

              Comment


              • #17
                To be honest - as self-hoster, since I use ECC on my one server, the services have never been so rock stable.
                When I first built that server and made the first tests, I had some mce errors showing up in the kernel ring buffer. Some stress-testings later, I sent the chips back with a printed report. Crucial sent me a new batch "certified" to work, and indeed they did! :}

                I bet that in the gaming world (I am a gamer too GTW) many crashes are due to memory errors. But making a ECC enabled systems with Intel CPU's was very expensive.
                When I change my gaming rigs (well, wife and kids are gamers too, so it get expensive very fast here - but boy the fun we have when rampaging through WWZ or Dying Light as a family), I'll make sure to take Ryzen systems with ECC RAM.
                On the gaming systems however, the part that breaks first is the GPU (from experience) and it is really a pain when you have to identify through try & error that the upper RAM is bad - as simple games using only 2GB of graphic memory RAM run, complex games need more make the GPU driver/subsystem crash somehow randomly.
                So I bet ECC is needed there too for a stable system and at least detect errors fast. On the other hand it's cheaper & easier for the manufacturers to point to buggy Windows.
                Linuxer since the early beginnings...

                Comment


                • #18
                  Originally posted by AdrianBc View Post

                  ECC does not trade speed and latency for reliability. ECC trades only cost for reliability.

                  The available ECC modules have lower speed and latency than the fastest available non-ECC memories only because the latter trade speed and latency for reliability.

                  The ECC modules are available only at the speeds and latencies standardized by JEDEC, which are chosen to provide good reliability, while the memories faster than that are actually overclocked.
                  Server ECC memory is available for the highest supported clock profiles that the cpus allow. I use DDR4-3200 ECC RDIMM for my Epyc hosts. That is the fastest the cpu is designed for.

                  There are now DDR5 ECC RDIMMS for servers all the way up to 6000MTs which is the sweet spot for current gen AMD processsors.
                  Kingston Technology has released their latest DDR5 ECC Registered DIMM lineup, &quot;FURY Renegade Pro DDR5 RDIMM&quot;, with capacities of up to 256GB and operating clocks of up to 6,000MHz. ...


                  The only reason for running memory past JEDEC specs is for achieving the lowest latencies for gaming applications for the most part. Intel seems to have the better memory controllers in the current generation of parts, at least on the consumer side. On the server side, AMD and Intel both support DDR5-4800 and seem to have speed parity except AMD has the bandwidth advantage.

                  Seems like the memory vendors are overclocking past the JEDEC spec even for server memory now.

                  Comment

                  Working...
                  X