Announcement

Collapse
No announcement yet.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by schmidtbag View Post
    Again, if that were true, it would be the standard. You wouldn't have an option
    There are so many cases in human history where this is false.

    Comment


    • #62
      Originally posted by drhoho View Post
      I live at 7500 feet. Cosmic ray flux is 7.2 times higher than at sea level. During solar storms with 32 GB of non-ECC I'd have to reboot 1-3 times a day. Got a new laptop with 64 GB of ECC (Xeon) and it's stable for months.

      Linux is correct and the cosmic ray issue is real. Google Cisco cosmic rays-ECC, etc. Sandia did a study that confirmed the issue of cosmic particle flux and non-ECC.



      About Introduction We’ve collected a few amusing and interesting things about bit flipping caused by cosmic rays the other day. Please rest assured we are only using ECC memory - Wikipedia for our server machines. DRAM quotes “A bit flipping at random is not a problem solely related to broken memory. Perfectly healthy memory is also subject, with a small probability, to bit flipping because of cosmic rays. […] According to a few sources, including IBM, Intel and Corsair, a computer with...


      So, not a rant, but reality.

      What is it like living so high, 7500 feet wow....

      Comment


      • #63
        Hi everybody:

        I subscribe just for this post.

        About Linus and ECC: bollocks.

        What is the problem of ECC?
        * It costs more because instead of 8 modules, we should have a 9 module plus a chip.
        * It adds latency. Why? Because the tiny chip should validate every operation.

        And what is the advantage of it?

        For instance, ECC does not automatically correct memories.

        Usually, ECC does the next steps:
        * The system reads some space of memory including the parity check.
        * If the parity fails, then it tries it again, it is the "correction" and yes, it is flaky and usually it doesn't solve the problem at all and usually continues with the next step.
        * If the memory fails it again, then the system halts (or enters into interactive/maintenance mode). If you are using an expensive server, then it could show some lights and even says which module failed.

        Also, memories are way less prone to fail than in the past. For example, if you are worked on a datacenter, then replacing a hard disk is part and included in the global costs. Instead, it is not common to replace the memories or the CPU, mainly because those components are well inside of the motherboard and protected inside several layers of capacitors.




        Comment


        • #64
          Originally posted by Ivan Dimitrov View Post
          Totally support Linus's sentiment although it is not fully correct. AMD supports (validates) ECC RAM support ONLY on their PRO processors. ECC is enabled on non-PRO processors but validation and implementation is left to the motherboard vendor. So ECC will most likely work on non-PRO processor but it is not clear if it will work in ECC mode and how will it report the errors. You can watch Wendel for the current state of ECC on Ryzen - in short it is messy.
          You can argue that Ryzen's "support" of ECC can give false sense of security for some users....
          Anyway whoever needs ECC they are most likely to buy the solution validated by the vendor which means Ryzen PRO and Xeon CPUs - not big difference here. So to be precise instead of "AMD did it", I think the correct statement is more like "AMD raised a valid point, made some noise and scored some marketing points for ECC RAM support".
          Yeah, this is the problem. AMD fans love running around and saying shit without checking all of the caveats and conditions. AMD doesn't officially support or test it. I mean, it could be like RDRAND where it's telling you everything is A-OK while it's really utterly broken. Either it's supported or it's not. Not this half-bullshit weak statement by AMD, which AMD fans like to wave in other people's faces.

          Comment


          • #65
            Originally posted by zxy_thf View Post
            -- "The "modern DRAM is so reliable that it doesn't need ECC"

            How dare they lie in a straight face?
            DRAM is much much less reliable than even five years ago. MemTest is mandatory nowadays even if you're not doing anything special.

            I've purchased DRAM modules ~4 years ago and last year. The older ones run on my semi-server (without ECC because I misread the spec) quite happily, but one of latter modules can't even pass MemTest for 1 minute.
            Another module from my colleague (brought last year) was also broken from the beginning.

            A funny fact for modern DDR4 is, they're mostly DDR4-2133 under JEDEC spec (1.2V), and people simply overclock it to 3200+ with 1.35V, under an obscure name called "Intel XMP".
            The desktop DRAM market is not trustworthy anymore, when overclocking becomes the "new common".
            Well my RAM "overclocks" to 2666 MHz at 1.2 V . And they've been working perfectly fine for 5 years. I've seen how bad defective RAM is, and had it replaced with working RAM as well.

            If the newer RAM modules were *that* defective, people would be ranting and up in arms about bad RAM. And tech reviewers would also be ranting about it.

            Comment


            • #66
              Originally posted by schmidtbag View Post
              This is one of those things where if you don't know whether you need it or even know what it is, you very likely (but not assuredly) don't need it. You don't need ECC on a family PC. You don't need ECC for a gaming PC. You don't need ECC for a home media center. You don't need ECC for an office PC that just runs a web browser and MS Office all day. Bit flipping is a very real and dangerous problem but it's not enough of a threat to the average user. If it were, either all RAM would be ECC or all CPUs would support ECC.
              You don't need some sort of Computer.

              Comment


              • #67
                Originally posted by Mel Spektor View Post
                Can Intel just die please?
                Well enjoy using AMD CPUs with little to no Linux support. Even earlier when they were doing worse than Intel, AMDs Linux support was pathetic. With no competition, they'll just keep shitting out worse and worse code. And you'll keep lapping it up like other AMD fans.

                Enjoy the lack of voltage, current and power monitoring. In the future, probably even frequency monitoring won't be supported. Oh, you're trying to figure out why your Ryzen CPU cores aren't boosting to the turbo frequency? Well it turns out that only one of them is guaranteed to boost to the turbo frequency. Everything else is a coin toss. AMDs turbo frequency is a lie. It's pure marketing bullshit.

                Comment


                • #68
                  Originally posted by CommunityMember View Post

                  I believe the issue is not that most people don't care, but they are not aware they should care or need to make a choice.

                  When it comes to such purchases, some people (a lot of people, actually) want to go to Wally World (or BestBuy) and get a cheap, but serviceable tool, and some people want the best, no matter what the price.

                  One tends to choose ECC only when one is spending OPM, and not so much when you have to choose to spend your own money on your families devices (unless your income is like high 6/7 figures like Linus's).

                  Of course, Linus is speaking only to the choir about the value of things like ECC. One should certainly ask him whether his laptop, and his phone, and all those he has bought for his family has ECC, or whether he has chosen "good enough" in that case, which is, for better or worse, where most people tend to end up (I am only aware of a handful of laptop vendors that offer ECC, and while I suspect there is some specific phone with ECC, that is not the norm unless your have a .gov at the end of your email address).
                  Pretty sure most people don't know what CPU their devices are using either, atleast on mobile phones and tablets. If you ask Apple fans they'll say it's Apple's A12 CPU , and they'll believe that it's some entirely custom made, from scratch Apple CPU and not ARM.

                  Only reason we know on desktop is because Intel and AMD hardware usually comes with their stickers plastered on.

                  Comment


                  • #69
                    Originally posted by Zan Lynx View Post

                    People need it even if they don't know it.

                    I've helped out friends and family who have corrupted documents. Was it their drive? I guess they should have been running btrfs or ZFS. They didn't know it, but they needed it. Or it could have been a RAM error while copying or saving the file. However, since their computer didn't have any error correction there is no way to know what error correction would have helped.

                    Nice Catch-22 there isn't it. Without ECC you don't know if you needed ECC.
                    True, I do agree that ECC everywhere would be good to have. Only reason we don't use it is because of the higher cost.

                    Comment


                    • #70
                      Originally posted by Sonadow View Post
                      The fact remains that the world has been fine with using non-ECC memory on high-end professional workstations, desktop PCs, gaming PCs and work PCs for decades with no major complications or consequences. People *want* ECC memory but don't *need* it. Gamerfags sure as hell don't *need* ECC memory other than to have dick size comparisons with other gamerfags.

                      I have a 2990WX workstation with 128GB of standard DDR4 memory for use in compiling software (with the memory being used as a RAMdisk for faster compilations) and have not run into any problems since day one.

                      Save the limited stocks of ECC memory for the machines that really *need* them, like production-critical servers in gigantic datacentres where even a single corrupted bit results in massive consequences.
                      I dunno, I'd like to have ECC RAM. Better reliability is always better. If it was more affordable, without as big a performance cost, we'd all be using it more.

                      But yeah I agree that most AMD fans are simply thumping their chest and comparing their dick/boob sorry, CPU sizes to others'.

                      Also stop using fags in a derogatory way like that, it's a shitty thing to do.

                      Comment

                      Working...
                      X