Announcement

Collapse
No announcement yet.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Rowhammer, unlike most other vulnerabilities, does not use unexpected inputs, violated assumptions, logic-bugs in hardware or software, or missing corner-case-checks to do its magic. All rowhammer does is make a natural ram corruption that could occur at any time by pure chance more likely. So the simple fact that rowhammer exists proves that memory is not fully reliable, even if you're not concerned with the security implications. If you want the highest hardware stability, you need ECC.

    When Intel says that consumer hardware doesn't need ECC, they're saying, "When a consumer's machine randomly corrupts data on our hardware, that's expected and fine 'coz it's rare. But servers still need it 'coz it's not that rare." LOL, well done Intel.

    Comment


    • #42
      Originally posted by bug77 View Post
      On a more practical note, ECC means higher latency. Whoever moves their consumer chips to ECC first, will be slaughtered in benchmarks.
      Originally posted by Zan Lynx View Post
      This has to be old information. I am running with ECC RAM on this Ryzen 5950 and its latency stats are right in line with any other 2,666 MHz CL19 RAM.
      I would expect a slight latency increase for the first read in a cache line on an L3 miss, however (a) there are typically more subsequent reads to the same cache line and those would not be affected so the average increase would be pretty low, and (b) the increase in latency for that first word would be tiny in comparison to the RAM access time anyways.
      Test signature

      Comment


      • #43
        100% agree with Linux on this. I have cheap and old Asus B350 board, cheap Ryzen 1700 and 64GB of DDR4 2666MHz ECC RAM. All working perfectly as my home Debian server. If I wanted ECC on Intel platform I would have to pay several times more.
        Thanks God we have Linux and AMD. No thanks for Intel or Nvidia.

        Comment


        • #44
          The fact remains that the world has been fine with using non-ECC memory on high-end professional workstations, desktop PCs, gaming PCs and work PCs for decades with no major complications or consequences. People *want* ECC memory but don't *need* it. Gamerfags sure as hell don't *need* ECC memory other than to have dick size comparisons with other gamerfags.

          I have a 2990WX workstation with 128GB of standard DDR4 memory for use in compiling software (with the memory being used as a RAMdisk for faster compilations) and have not run into any problems since day one.

          Save the limited stocks of ECC memory for the machines that really *need* them, like production-critical servers in gigantic datacentres where even a single corrupted bit results in massive consequences.

          Comment


          • #45
            We all know about ECCploit right?

            "...memory manufacturers are starting [to] do ECC internally..."

            Interesting until the chosen error threshold is passed and there is no way to communicate the faIlure to the CPU.

            Comment


            • #46
              Originally posted by Sonadow View Post
              Save the limited stocks of ECC memory for the machines that really *need* them, like production-critical servers in gigantic datacentres where even a single corrupted bit results in massive consequences.
              It isn't as if ECC RAM is difficult to build. It is one extra RAM chip on the DIMM.

              "Limited stocks", hah.

              Comment


              • #47
                Originally posted by Sonadow View Post
                I have a 2990WX workstation with 128GB of standard DDR4 memory for use in compiling software (with the memory being used as a RAMdisk for faster compilations) and have not run into any problems since day one.
                And I had a desktop workstation at work for compiling software in 2004 which developed an intermittent RAM error. I must have lost two weeks of work in total trying to debug crashes that didn't exist and test failures that didn't exist before I figured out I needed to run a 24h memtest and then replace the RAM.

                And later in 2007 I built a new gaming desktop with some PNY RAM from Fry's that turned out to have a single completely stuck bit. It ran most games fine except for Doom 3 for some reason. The only reason I figured it out was Doom reported invalid checksums from a pak file. Doing a compare to the CD-ROM showed a bit error.

                If you ever have a similar experience you will never be so dismissive of ECC.

                Comment


                • #48
                  I think most would agree that we want ecc on the server - but do we need it on the workstation? I have a 4 year old asus zen book that logs ecc errors; it's never had one.
                  Some have said that need it if you live in Denver 'cause of the cosmic rays at that altitiude, but I haven't seen any verification for that.
                  I think the most compelling reason in this discussion is to avoid row hammer exploits. But, is it really the most effective mitigation for that? Supposedly ivy bridge's refresh is better protection.
                  Some times a rant is just a rant.

                  Comment


                  • #49
                    I live at 7500 feet. Cosmic ray flux is 7.2 times higher than at sea level. During solar storms with 32 GB of non-ECC I'd have to reboot 1-3 times a day. Got a new laptop with 64 GB of ECC (Xeon) and it's stable for months.

                    Linux is correct and the cosmic ray issue is real. Google Cisco cosmic rays-ECC, etc. Sandia did a study that confirmed the issue of cosmic particle flux and non-ECC.



                    About Introduction We’ve collected a few amusing and interesting things about bit flipping caused by cosmic rays the other day. Please rest assured we are only using ECC memory - Wikipedia for our server machines. DRAM quotes “A bit flipping at random is not a problem solely related to broken memory. Perfectly healthy memory is also subject, with a small probability, to bit flipping because of cosmic rays. […] According to a few sources, including IBM, Intel and Corsair, a computer with...


                    So, not a rant, but reality.


                    Comment


                    • #50
                      Originally posted by zxy_thf View Post
                      Threadripper CPUs have ECC hardware, although they're more expensive than the combo of Xeon E3 (Now it's called Xeon E
                      Threadrippers are in a completely different price segment. The E-series Xeons actually cost similar to their desktop-oriented counterparts (i.e. Intel's mainstream desktop CPUs).

                      Moreover, even Threadrippers now have the same ECC status as the desktop-oriented Ryzens -- AMD doesn't support or guarantee it -- they leave that up to motherboard vendors. If you really want full ECC functionality and support from an AMD CPU, your only options are to get a Pro-branded APU, CPU, or Threadripper, or to use an EPYC.

                      Comment

                      Working...
                      X