Announcement

Collapse
No announcement yet.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by sandy8925 View Post
    AMD also artificially segments the market. As Ian Cuttress has stated, only the PRO business versions of AMD CPUs officially support it.
    On APUs, AMD does the same as Intel - they actually disable it on the non-Pro versions.

    For the rest of the Ryzen lineup, I think the distinction between Pro and non-Pro is largely symbolic, with regard to ECC. I'm not really clear on the significance of leaving it up to motherboard manufacturers to certify ECC-support, unless they're also having to write custom BIOS firmware to do it (which I think they're not).

    Originally posted by sandy8925 View Post
    On the normal versions, it may or may not work - it depends on your motherboard.
    It always depends on your motherboard! Motherboards that don't support ECC will not have the extra traces needed for it, regardless of whether the CPU is Intel or AMD and whether the CPU advertises ECC support or not!
    Last edited by coder; 09 January 2021, 05:17 PM.

    Comment


    • Originally posted by f0rmat View Post
      I do not know about AMD getting DEC engineers, but I have read where many DEC engineers went to Microsoft and helped form the foundation of NTFS. (Please do not ask me where - I could probably find it.) DEC systems were amazing for their time. I loved VMS and the VAX architecture.
      NTFS dates back to the early 1990's. However, I think the architect of VMS was said to have taken charge of WinNT's design, which some have claimed explains its name (the NT nominally meant "New Technology", but that always sounded a bit thin):

      Code:
      'V' + 1 -> 'W'
      'M' + 1 -> 'N'
      'S' + 1 -> 'T'
      I do know that Michael Abrash spent some time optimizing NTFS driver code, before he went to ID Software to help write Quake.

      Comment


      • Originally posted by bridgman View Post
        per-unit fees.
        i see 3 thinks what AMD can do to get higher sales.

        (First)
        release a 5800XT and 5900XT with higher clock speed similar to 3800XT and 3900XT
        this is done simpel by the TDP headroom compared to the 5950X

        (second)
        Sell a 5950X Gaming Edition with disabled Hyperthreading gives 5-45% more performance to gamers. (yes they can disable it in bios but many people just use it in default mode without going into bios to chance something)

        (third)
        start producing the threadripper 2950X again with the 12LP+ node what is 20% faster than the orginal 12nm node. the name should be something like 2955X or 2959X
        https://www.golem.de/news/auftragsfe...09-144121.html
        there are many people like myself with Threadripper TR4 mainboards want to upgrade to i already selled my 1900X to upgrade to a 2950X and i have a 1920X i would upgrade it to a 12LP+ 2955X.

        this would also solve the problem of 5nm/7nm shortage it would increase the performance of a 2950C by 20% but without incrase the 5nm/7nm shortage.

        and mayne people like me who has 2 TR4 mainboards would more likely buy a 12LP+ 2955X to upgrade than to buy a 3950X or 5950X.
        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • Originally posted by schmidtbag View Post
          Keep your unproductive comments to yourself, then. You instigated this.
          If you make a sloppy post, don't blame me. If someone posts bad or questionable claims, they are the instigator.

          Blaming me is like a criminal blaming a witness for seeing the crime. It's not seeing the crime or reporting it that's the fault, it's the actual perpetration of the crime.

          Originally posted by schmidtbag View Post
          Evidence that I'm wrong.
          In the very part that you quoted, I was as clear as day that I didn't say you were wrong. I merely said that your source is bad, because it's non-representative of modern-day consumer RAM. Moreover, it seems contrary to other technology trends, which is why I deemed it worth questioning.

          Originally posted by schmidtbag View Post
          I made a claim and cited a source. You don't like the source. I could just keep posting sources but you'll just reject them anyway, so, it's your move.
          The fact that you're not interested in having good data suggests your interests lie elsewhere than learning and sharing of accurate information. I think that's a loss for us all.

          Originally posted by schmidtbag View Post
          I don't give a shit if you like my opinion,
          That's clear. I don't expect you to care what I think. However, I expect you to care whether your facts are solid. I ask questions and request sources to help you revisit some of your assumptions that I think might be off. Maybe I'm wrong, but then I stand a chance of learning something from your answer.

          That presumes you have the same goals, however. If not, then it all breaks down, like we've seen, and you end up putting your energy into lashing out rather than possibly learning something you didn't know and possibly educating others, in the process.

          Originally posted by schmidtbag View Post
          I would think the name Puget alone would be enough, but fine, here's the source:
          https://www.pugetsystems.com/labs/ar...-of-2018-1322/
          The specific claim is needed, in order to see what they measured and how, so we can judge its relevance to the points under discussion.

          First, there's the problem that they're specifically sourcing some of the most reliable memory products from the most reliable vendors they could find. So, it's probably not representative of average consumer-grade memory products.

          https://www.pugetsystems.com/blog/20...manufacturers/

          (and while that was published a decade earlier, I doubt their principles in memory vendor selection have changed. Also, it's good to know their experiences with Kingston match mine - that's my preferred brand.)

          Next, it's not talking about the rate of random memory errors, but rather failed DIMMs. A failed DIMM is one with bad cells or other faults that lead to reproducible errors. So, you can have a situation in which the rate of random errors actually increases (e.g. due to shrinking cell sizes, higher frequencies, and decreasing voltages), even while the number of reproducible defects decreases.

          Finally, since we see it's talking about defects rather than random errors, any number of things could've contributed to that change in reliability, including better QA by manufacturers.

          Also, I'd note that their field data likely covers RAM that's bad enough to lead to significant stability problems, and doesn't extend beyond the warranty period. If you leave machines in service for long past the warranty period, one should expect the failure rate to increase.

          Originally posted by schmidtbag View Post
          Oh, so now you agree that if someone doesn't like a source that they have to provide their own? Hypocrite.
          You're misunderstanding or misreading my statement. I made a claim and provided a source. I didn't know why you thought differently, but figured you must've read it somewhere and politely asked you to share it with us, if so. You don't have to, and if you could simply make a compelling argument that my source did not, in fact, support my claim, I would consider that as well. I see no hypocrisy in that.

          Originally posted by schmidtbag View Post
          In either case, how about you read the whole article?
          I did. It's not long.

          Originally posted by schmidtbag View Post
          To quote it directly:
          Your quoting could be clearer. It's not clear that the following line isn't also part of the quote. I see that you italicized the quoted part, but that's subtle and text formatting is unreliable. I recommend indentation, ideally in combination with the QUOTE tag. Like this:

          As we know from the official DDR5 specifications, each module will include on-die ECC for cell-to-cell data coherence (module-wide ECC is still optional).
          (emphasis added by schmidtbag )

          Originally posted by schmidtbag View Post
          The default configuration appears to include ECC. That doesn't mean all will have it.
          The point I was making was about the on-die ECC. And the way I read the statement, it doesn't sound like that part is optional. Again, if you have a good reason to believe otherwise, please educate us.

          Originally posted by schmidtbag View Post
          Cite your sources about the evidence.
          I did, and I explained why I think it suggests that. If you'd like to offer an alternate explanation (as I invited you to do, in the very part you quoted), you're certainly welcome to do so.

          Originally posted by schmidtbag View Post
          Remember, we're talking DDR5 now, where ECC will be the norm. So yes, it does make sense. You want to reduce the cost of mass-production. Integrating it is most likely cheaper than having separate chips. For those who don't care about ECC (like budget phones), they'll likely cut costs and go with chips that don't have ECC.
          I can't even follow this statement. First, ECC DIMMs don't do the ECC on-module, so I presume by "extra chips" you mean the extra DRAM chip. Second, for DIMM-wide ECC, you'd still need extra DRAM chips, which blows a hole in the idea that this is just a cost-optimization for servers. Third, the overhead they're adding is 25%, instead of the current 12.5%, so it's significantly more expensive and therefore not something you'd do if it weren't necessary. Fourth, it sounds like you're agreeing that the industry could produce DDR5 DRAM chips without on-die ECC, if it made sense to do so, in which case why wouldn't PC memory OEMs just use those same chips on their non-ECC DDR5 DIMMs? ...unless there aren't going to be any DDR5 DRAM chips without on-die ECC, because they'd be too unreliable!

          Originally posted by schmidtbag View Post
          No, it isn't. Back in 2014 when Intel was basically a monopoly, I built a PC for someone with a 4c/8t Xeon and ECC. The Xeon was roughly the same price as the equivalent i7 (but lacked an iGPU, which he didn't need) and the motherboard was maybe $15 more expensive. That was a worst-case scenario
          What about that was worst-case? Sounds like a best-case scenario, to me.

          Originally posted by schmidtbag View Post
          I already linked to a decent Xeon with ECC support for a reasonable price. If you find the motherboards are too expensive (which they're probably not but I'm too lazy to check), go with AMD.

          Getting ECC affordably is a non-issue. If it's really that important to you and you're on that tight of a budget, don't go with Intel.
          I already broke it down for you. There are two affordability problems. One is the minimum spend needed to get ECC, which is a lot more than the cost of the cheapest non-ECC PC. With Intel, you need to step up to at least an i3, and with AMD, you can't use any of their APUs, meaning you have to buy a separate GPU card. That's all before we even get to motherboards or the RAM, itself.

          For the second affordability issue, let's say you were already going for more than a minimum-spec PC and wanted to get a HEDT Intel CPU. The price difference between some of those chips and their Xeon counterparts is more substantial. For instance, if you want a 18-core Intel CPU, you'd have to get a $1333 Xeon W-2295 instead of a $1000 i9-10980XE.


          Originally posted by schmidtbag View Post
          Your demands over a conversation you weren't apart of is.
          Nobody is making you engage the issues I raised. If you think the standards I hold are unreasonable, you are free to say so and drop the matter.

          Originally posted by schmidtbag View Post
          And yes, you are hurling insults. You call my points irrelevant,
          The fact that you consider this an insult is part of the problem. First of all, it's about your point and not you. Secondly, it's merely a claim that I back with an argument, which itself is open to counterargument. If the relevance of points cannot be disputed for fear of hurting someone's feelings, then we can't have proper arguments on this site.

          Originally posted by schmidtbag View Post
          you question my credibility,
          What I did was to show how specific actions could have an impact on your credibility. I made that observation for your own benefit.

          Originally posted by schmidtbag View Post
          you [falsely] claim I'm strawmanning.
          Again, I disputed one of your points as a strawman, not as a personal attack, but because that's how I read it. After your response, I explained why I read it that way. It sounds like it was a miscommunication, instead.

          Originally posted by schmidtbag View Post
          No, actually, I didn't bring it up. Read back through the whole thread: I did not bring up hard drives.
          I am talking about the quote:

          Originally posted by schmidtbag
          Doesn't change the fact that data corruption errors on disks happen more often than ECC errors [for home users].

          Your own words. That's all I'm talking about. Stand behind what you say. That's a pretty basic standard.

          Originally posted by schmidtbag View Post
          "be a decent person" - there goes another insult, hypocrite.
          That was meant as a way of pointing out what's the decent thing to do, in the scenario I described; not to say that you're not a generally decent person.

          Originally posted by schmidtbag View Post
          I have been standing by my words, you're just picking the ones you feel like arguing with and ignoring the rest.
          Am I supposed to argue with points I agree with?

          Originally posted by schmidtbag View Post
          Then practice what you preach and provide counter-sources.
          I do either provide sources or at least explain my reasoning.

          Originally posted by schmidtbag View Post
          So how about you stop jumping to conclusions about things I never said
          If you never said it, then it should be a pretty quick and easy misunderstanding to clear up, no?

          Originally posted by schmidtbag View Post
          when you insert yourself in a conversation you weren't apart of?
          Again, it's a group discussion, by definition. Every post has a Reply button, for all users!

          I'm beginning to think you don't understand how forums work. If you really want to have a discussion with someone, where no one else can interject, then send them a private message. But you don't get the privilege of publicly airing your views without others being able to call them into question. The goal of this forum is not to feed egos, it's about sharing information.

          Originally posted by schmidtbag View Post
          I agree. But you're not doing your job when you disagree/disapprove and don't provide counter-sources.
          Not every statement requires or even deserves a citation. Some arguments can stand on the basis of their own internal logic, and facts which are commonly held to be true. You don't have to agree with a claim, but this issue of yours seems to have taken on a life of its own. Rather than a real discussion of the point in question, it seems like you're just using it as a means to distract and redirect.

          Originally posted by schmidtbag View Post
          I support my way of thinking the way I feel makes sense.
          That sentence doesn't parse.

          Originally posted by schmidtbag View Post
          It's not my problem if you don't share that view.
          We agree on that much. I don't actually care if you agree with my points, what I care about is when people post dubious claims and lash out at anyone who questions the claims' merits. It says a lot about their underlying motivations.

          Originally posted by schmidtbag View Post
          Remember: you responded to me.
          What does that have to do with anything? You are responsible for your posts, whether they're replies to me or not. It's as simple as that.

          Originally posted by schmidtbag View Post
          How exactly are you doing anything different than me? You instigated an argument with me.
          You're describing this through a very ego-centric point-of-view. I didn't see a post by you and decide to pick a fight. I saw some claims I doubted that I thought needed to be questioned. Again, that's not even going so far as to say you're wrong - just that they should be interrogated to see if there's real substance behind them. It's not about you, personally. You can always make it about yourself, but I'm trying not to.

          Originally posted by schmidtbag View Post
          I haven't been reading your contributions to this thread other than the ones you've sent to me, because I don't care, and I'm not interested in inserting myself in the middle of someone else's debate.
          What do you think the thread is for, and why do you suppose the posts are public? And if you don't read other posts, aren't you missing a chance to learn more about the subject?

          Originally posted by schmidtbag View Post
          Contrary to what you might believe, I don't care about this topic that much in general.
          If it's not a matter of interest for you, why are you still here?

          Originally posted by schmidtbag View Post
          I only came here to say that Linus is overblowing the severity of this problem. I'm not even saying he's wrong.
          So, you were just trying to take a drive-by shit on the topic? You should weigh the value of making a post about something so low-stakes for you. If you're not prepared to back up your claims, perhaps it's just not worth it simply to share an opinion.

          Originally posted by schmidtbag View Post
          you're pushing things to unproductive extremes.
          Yes, it's gone meta, but I think it's still a relevant and worthwhile discussion about forum conduct.
          Last edited by coder; 09 January 2021, 11:03 PM.

          Comment


          • Originally posted by Qaridarium View Post
            i see 3 things that AMD can do to get higher sales.
            He's a Linux driver developer, in their graphics group. AMD is a pretty big company -- I'll bet he doesn't even know the name of any CPU product managers. They're probably based at an office thousands of miles away, and they've never crossed paths or spoken.

            If you really want to provide input on their CPU product line, find out the next time AMD is having an AMA on Reddit or some other site, and tell them there.

            Or maybe try posting in the forums on AMD's own site.

            Comment


            • Originally posted by schmidtbag View Post
              In my experience, when RAM gets faulty, it's pretty abrupt. That's not to say you're wrong, but once RAM has a physical defect, it doesn't take long to notice there's a problem.
              Unless you run memtest on a routine basis, and even proceed to use your PC after some errors have been detected, I don't see how you can know that.

              I believe you know how fast a PC goes from stable to unusable, but the problem is that you don't know when the first memory errors occurred. The only way to know that is either by using ECC memory or routinely running memory tests. Otherwise, you have no early-warning system.
              Last edited by coder; 09 January 2021, 11:40 PM.

              Comment


              • Originally posted by coder View Post
                He's a Linux driver developer, in their graphics group. AMD is a pretty big company -- I'll bet he doesn't even know the name of any CPU product managers. They're probably based at an office thousands of miles away, and they've never crossed paths or spoken.
                If you really want to provide input on their CPU product line, find out the next time AMD is having an AMA on Reddit or some other site, and tell them there.
                Or maybe try posting in the forums on AMD's own site.
                you are right he is only what he is. but see it like this we are allowed to talk to even talk freely
                and he is not the only one from amd who reads here but it is fact that he looks like the only one who write regularly here ... yes others write to but it is save to say that compared to bridgman they are silence and bridgman speaks the most.
                so why not talk freely so maybe others read it to? and maybe they can start talk inside of AMD and maybe some ideas are good and maybe the ideas become reality in the future.
                i am 100% sure they read my words and many of my ideas in the past come into reality.
                there is no "proof" that i am the only source and i do not even want to proof anything but i think many of my ideas already become reality.

                and not 1-2 it is already so many of my ideas that i can not even count it myself anymore.

                about ECC and SR-IOV...

                i am sure as soon as we have any sane political representation of the people
                we will make any non-ECC hardware and non-SR-IOV hardware against the law.
                ECC because of data integrity and SR-IOV because of security reasons of trojan horses to for example isolate games into a VM to make sure no trojan horse hidden in the closed source game can break out of the VM.

                and i am 100% sure any person profits from data integrity and security.
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • Originally posted by Qaridarium View Post
                  you are right he is only what he is. but see it like this we are allowed to talk to even talk freely
                  and he is not the only one from amd who reads here but it is fact that he looks like the only one who write regularly here ... yes others write to but it is save to say that compared to bridgman they are silence and bridgman speaks the most.
                  He is indeed a good guy and you're free to post as you like. I was just offering a suggestion of a better way to get your ideas into the minds of the people at AMD who actually make decisions about that stuff.

                  Comment


                  • Originally posted by coder View Post
                    Unless you run memtest on a routine basis, and even proceed to use your PC after some errors have been detected, I don't see how you can know that.

                    I believe you know how fast a PC goes from stable to unusable, but the problem is that you don't know when the first memory errors occurred. The only way to know that is either by using ECC memory or routinely running memory tests. Otherwise, you have no early-warning system.
                    If you're actively using the PC, you find out quickly because applications often spontaneously crash or your whole OS locks up. On something like a server (of which you don't typically run things directly), you could have errors occur days or hours earlier and never know, which is why ECC is so critical for them.
                    Again, if you're handling something that's actually important where you can't afford the risk of failure, get ECC. That isn't the case for most desktop PC users.

                    Comment


                    • Originally posted by coder View Post
                      He is indeed a good guy and you're free to post as you like. I was just offering a suggestion of a better way to get your ideas into the minds of the people at AMD who actually make decisions about that stuff.
                      i do not believe in authority or rang ... talking with bridgman is as good as talking with Lisa Sue dirctly. the strength of your ideas should not depend on who you talking to instead it should be the cleverness of your idea what motivate the others to move the idea to the position who the decisions are made.

                      a quick look at ebay shows that even used 2950X cpus cost you 700-800€ the price is higher than faster 3950X ... and produce a 2955X in Globalfoundries 12LP+ to get 20% higher clock would make many people to upgrade their TR4 mainboards. if this would block 5nm/7nm it would be pointless but it used the 12nm Globalfoundries nodes so it should not be a problem.
                      Phantom circuit Sequence Reducer Dyslexia

                      Comment

                      Working...
                      X