Announcement

Collapse
No announcement yet.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by darkoverlordofdata View Post
    I think most would agree that we want ecc on the server - but do we need it on the workstation? I have a 4 year old asus zen book that logs ecc errors; it's never had one.
    Some have said that need it if you live in Denver 'cause of the cosmic rays at that altitiude, but I haven't seen any verification for that.
    I think the most compelling reason in this discussion is to avoid row hammer exploits. But, is it really the most effective mitigation for that? Supposedly ivy bridge's refresh is better protection.
    Some times a rant is just a rant.
    Well yes, if ECC can be supported at similar costs and performance as normal RAM, why not? I mean, there's a reason we have checksums on network packets, disc images, files, BitTorrent etc.


    ​​​​​​

    Comment


    • #72
      Originally posted by darkoverlordofdata View Post
      I think most would agree that we want ecc on the server - but do we need it on the workstation? I have a 4 year old asus zen book that logs ecc errors; it's never had one.
      Some have said that need it if you live in Denver 'cause of the cosmic rays at that altitiude, but I haven't seen any verification for that.
      I think the most compelling reason in this discussion is to avoid row hammer exploits. But, is it really the most effective mitigation for that? Supposedly ivy bridge's refresh is better protection.
      Some times a rant is just a rant.
      Well yes, if ECC can be supported at similar costs and performance as normal RAM, why not? I mean, there's a reason we have checksums on network packets, disc images, files, BitTorrent etc.


      ​​​​​​

      Comment


      • #73
        Originally posted by coder View Post

        Um, that's some pretty bad software you seem to be using. In the past decade or so, I've found faults in bread-and-butter software and drivers to be exceedingly rare. In contrast, when I've investigated unstable machines and found the culprit to be bad RAM, the instability was often severe enough to render the machine almost unusable. However, it's likely there are machines I've used with non-ECC RAM that would experience memory errors that would go silently undetected -- I'll grant you that.
        Lol, what? Almost all software has bugs. And a lot of commercial software has bad bugs. I mean, just off the top of my head, the random freezes for my AMD GPU on Linux. The random black screen problem on the AMD Windows driver. The AMD Windows driver mucking up fan speeds. Windows refusing to connect or unpair with my Bluetooth headset (leaving it unusable on Windows with no way for me to fix it). A2DP suddenly not working properly/Pulseaudio mucking up on my Linux install. Some random sound glitch in Skyrim when running on Linux.

        Software all over the place is broken. I worked on an Android messaging app, before joining the company, I tried out the app and it seemed to work fine. I started working there, looking at the code and using the app regularly - it was a nightmare. My eyes bled several times over. So much horribly written crap.
        Last edited by sandy8925; 04 January 2021, 07:46 AM.

        Comment


        • #74
          Originally posted by schmidtbag View Post
          This is one of those things where if you don't know whether you need it or even know what it is, you very likely (but not assuredly) don't need it. You don't need ECC on a family PC. You don't need ECC for a gaming PC. You don't need ECC for a home media center. You don't need ECC for an office PC that just runs a web browser and MS Office all day. Bit flipping is a very real and dangerous problem but it's not enough of a threat to the average user. If it were, either all RAM would be ECC or all CPUs would support ECC.
          Of course, you'd be an imbecile if you built a server or high-end workstation without ECC.

          Is he, though? Because the choir are using Xeons and Threadrippers/Epycs. Like I said, you can get affordable hardware with ECC support.
          Maybe there's a little more truth when it comes to laptops, though I figure most people who are doing something critical enough to warrant ECC on a laptop are running most of their tasks on a server.
          Absolutely wrong.
          EVERY file is important for EVERY user. I'll put aside the big, stupid condescendance that comes from your post (like "those people are not professional so their data has no interest hence no value in the first place"), which is impressively ridiculous.
          People are students, independant workers, peoples detached and working at home, etc...
          EVERYONE has "production" documents they care about because it expresses some projects or tasks.
          But let's even consider an hypothetical world where people outside entreprises have only "non productive" data like music, personal photos or videos. The latter are memories, expression of their life. Why in the world would they not have a right to be guaranteed their integrity in time?
          If constructors don't put their best into reliability, the minimum should be to be upfront about it. IF they know the reliability of their hardware has been degrading to the point where failures may represent more than 0.5% of the full capacity (and I'm very generous here) and yet have been doing nothing about it, that's damn close to a scam.

          Originally posted by schmidtbag View Post
          Again, if that were true, it would be the standard. You wouldn't have an option.
          That's also absolutely wrong, and I'm impressed you could even spur such a thing.
          If the needs of people were always met with adequate quality or rather, if the products with the best quality to answer *actual* needs were always the ones winning, junk food would be a niche thing, Microsoft Windows (you know, the OS that yet still in 2020 is not even able to update properly) would have died a long time ago, and we wouldn't have a lot of junk service machines taking care of chores that fall apart every few years.

          The one thing those last 50 years have demonstrated is that going for quality is usually going for last place in market because precisely in many, many domains, people lack the motivation (or simply knowledge) to be aware of their actual needs, and how to inspect offering to determine which is actually good for them, not just in facade.

          Case in point: I've been interested in IT for more than 20 years, and there are a few areas in the hardware when I tried to get a bit knowledgeable like graphic cards or ssds... But I never realized all those problems about RAM, completely off radar. Yet I have a strong interest to it because apart from my work I have several "personal" and "semi-professional" projects ongoing, including realization and hosting of some professional websites for friends.
          Knowing this puts a new light in some (admitedly rare) data transfer problems I've had in the last ten years from disks I knew were still sane.
          Last edited by Citan; 04 January 2021, 07:45 AM.

          Comment


          • #75
            Good to see Linus growing back his gonads! Can we all petition Lenovo for a next get T14 Ryzen PRO-based laptop with up to 64GB of ECC and a *decent display*? They should be using higher quality parts in the T-series anyway!

            Comment


            • #76
              Originally posted by sandy8925 View Post

              Well enjoy using AMD CPUs with little to no Linux support. Even earlier when they were doing worse than Intel, AMDs Linux support was pathetic. With no competition, they'll just keep shitting out worse and worse code. And you'll keep lapping it up like other AMD fans.

              Enjoy the lack of voltage, current and power monitoring. In the future, probably even frequency monitoring won't be supported. Oh, you're trying to figure out why your Ryzen CPU cores aren't boosting to the turbo frequency? Well it turns out that only one of them is guaranteed to boost to the turbo frequency. Everything else is a coin toss. AMDs turbo frequency is a lie. It's pure marketing bullshit.
              Still crying on that like a spoiled child, huh? Poor lad.

              Since you seem so expert at analysing and writing low-level driver code, why not step up yourself? Your rant does prove you have an interest in the raw performances of those processors, so since AMD team apparently fails you, why not be gentleman about it and suggest AMD to hire you for that task (that way no problem with communicating confidential informations into the wind like would be if you were just "a guy from community")? That way you could teach AMD guys too how to stop" shitting out worse code", and this experience would be great on your profile. Everybody wins.

              Comment


              • #77
                AMD APUs don't support ECC according to various pages.

                Comment


                • #78
                  I think we need ECC on everything. I agree with Linus.
                  Many people are likely getting silent errors on a regular basis, they don't know so they don't care.

                  Comment


                  • #79
                    Originally posted by Qaridarium View Post
                    i buy AMD for ECC alone for years.
                    i had 32GB ECC ram for my FX8320
                    and i will soon buy 128gb ddr4 ECC ram for my Threadripper 1920X...

                    yes AMD also build cpu in the APU market without ECC but the situation with AMD is much better than intel.

                    today AMD is faster than intel means just avoid intel at any cost and buy ECC ram for it.

                    companies like intel only chance their mind if no one buy intel anymore.
                    Your ECC didn't work at all on FX8320 unless you bought motherboard worth far more then your CPU itself. You have to understand supporting ECC memory as it is working and fact ECC itself is working is 2 diffrent things.

                    AMD on consumer chips does support USING ecc memory, but it is up to motherboard manufacture to make ECC work. There is tons of people who bought Ryzen consumer chip, used ECC memory and then found out in Memtest it is not working. Which for me is even worse then Intel because it is false advertising in large part because before you had "ECC supported/not supported" and everywhere when it was supported it was working simply. Now with AMD it is "ECC supported/ECC supported by motherboard (what means CPU memory controller might not catch all bugs)/ECC not supported". And here motherboard makers of course won't make additional chips in between CPU and RAM to catch ECC stuff. This is why 99% of motherboards you see doesn't and won't support ECC memory, but there is a ton of users who think ECC works for them but it doesn't. As far as I know only some ASUS motherboards do support ECC on consumer AMD chips correctly and Asrock "sometimes". Everyone else is no.
                    Last edited by piotrj3; 04 January 2021, 09:26 AM.

                    Comment


                    • #80
                      Originally posted by bug77 View Post
                      Here's another way to look at it: ECC may be good, but AMD needs it more. Because Zen needs faster memory to really shine and faster memory is more error-prone. Plus, Zen just seems more picky about RAM choice in general.

                      On a more practical note, ECC means higher latency. Whoever moves their consumer chips to ECC first, will be slaughtered in benchmarks.
                      Also, as noted above, it's pretty pointless to improve the reliability of your RAM, when your software is 1,000x more likely to blue screen/core dump anyway.

                      I was with Linus when we "talked" about AVX, but I think he's not seeing/considering the whole picture here.


                      Originally posted by Zan Lynx View Post

                      This has to be old information. I am running with ECC RAM on this Ryzen 5950 and its latency stats are right in line with any other 2,666 MHz CL19 RAM. In other words, it isn't great, but that has nothing to do with the presence of ECC. It's because the RAM manufacturers don't bother to produce high speed ECC. If they did, it would be exactly the same.

                      AMD (and Intel I assume) build the RAM controller with integrated things like ECC, AES encryption, PCIe and cross-CPU (for EPYC) routing. All of this runs inline at full RAM speed. At the transistor level this is not hard.
                      Correction: ECC doesn't introduce at all higher latency, or by itself only introduces very maginally higher latency (for extra ECC bits). Also it is not true whoever moves to ECC memory loses, look at Nvidia who uses GDDR6X that is ECC by default and takes advantage of that ECC. Also overclockers found out on Nvidia Ampere GPUs, that if they overclock too much they lose score because ECC starts working and memory slows down. ECC only introduces latency if it is activly correcting errors. Both Micron in GDDR6X as well new prototypes for DDR5 are all using ECC to sustain stability on higher speeds, as you dont' have to worry that 0.001% error will crash your system, while more stable part of memory can freely run at higher speeds.

                      Comment

                      Working...
                      X