Originally posted by coder
View Post
Announcement
Collapse
No announcement yet.
Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC
Collapse
X
-
http://www.dirtcellar.net
-
Originally posted by sandy8925 View Post
True, I do agree that ECC everywhere would be good to have. Only reason we don't use it is because of the higher cost.
The main issue (which is what Linus is actually complaining about) is that Intel artificially segments the market so that the motherboards and CPU's that properly support ECC memory are much more expensive (i.e. typically server CPU's) than consumer ones.
So in an ideal world where all motherboards/CPU's would already come with ECC support then you probably wouldn't be able to even get non ECC memory in midrange and higher systems (only your wallmart $200 laptop would probably ship memory without ECC).
Comment
-
Originally posted by coder View PostSave the post-modern relativistic BS for literature class essays, please.
Evidence of what? I'm not the one making the claim, here. You did, and you cited a source which doesn't support it. That's different than saying your claim is incorrect.
I find it interesting that you didn't even link it. That's hardly the mark of an excellent source.
At Puget Systems, one of the most important things we track in our workstations is the failure rates of individual components. Overall, 2018 was a very good year for hardware reliability with about half as many parts failing this year versus 2015, 2016, or 2017. But what models were the best of the best?
According to Anandtech's latest DDR5 coverage, it is.
If you have a better source to the contrary, please share it for our collective education.
In either case, how about you read the whole article?
To quote it directly:
As we know from the official DDR5 specifications, each module will include on-die ECC for cell-to-cell data coherence (module-wide ECC is still optional).
The default configuration appears to include ECC. That doesn't mean all will have it.
It's evidence that single-bit error frequencies are indeed becoming too high, with newer cell sizes. Otherwise, why would they burn the overhead on it? In this case, it's yet worse than DDR4, with an overhead of 25%, instead of a mere 12.5%!
That doesn't even make sense. The consumer market is large enough to support a different set of DRAM chips, to the extent it makes sense to do so. For each client CPU sold, there will be about a couple dozen DRAM chips accompanying it. Compared to CPUs, DRAM chips are tiny and simple. If Intel can justify at least 4 different CPU dies in each generation (or about 7, if you include laptops), then surely DRAM makers can afford to design a separate chip for servers vs. clients!
Given that Intel's standard desktop CPUs do not support them (with a few exceptions), how are they not niche? Probably no more than 10-15% of the desktop board models out there support ECC. And those are mostly premium models that cost much more than average. I don't know what you consider niche, but I think that fits most people's definition.
The point isn't that it's cost-prohibitive for most, but that it's a nontrivial difference, for many.
With today's competition, it's really not a whole lot different. I already linked to a decent Xeon with ECC support for a reasonable price. If you find the motherboards are too expensive (which they're probably not but I'm too lazy to check), go with AMD.
Getting ECC affordably is a non-issue. If it's really that important to you and you're on that tight of a budget, don't go with Intel.
This is a community forum, where all users can read, post, and reply to all messages. If you think my behavior is out of line, you're free to take it up with the mods, but I'll point out that I'm not the one hurling insults.
And yes, you are hurling insults. You call my points irrelevant, you question my credibility, and you [falsely] claim I'm strawmanning.
As for not being the one who brought it up, it's your exact words that I quoted. Don't say things you can't back up, or at least be a decent person and admit when you've done so.
"be a decent person" - there goes another insult, hypocrite.
If you consider expecting you to stand by your words obnoxious, then I guess so.
All I expect is for people not to play fast and loose with the facts, to be accountable for their statements, and to maintain a basic level of decorum. You don't have to concede anything, but it's hard to have a productive discussion when one party is twisting and shifting their position and refusing to be pinned down. We can certainly agree that a point is irreconcilable and move on, but that at least takes some agreement on what point is in dispute.
I agree with your 2nd sentence. So how about you stop jumping to conclusions about things I never said when you insert yourself in a conversation you weren't apart of?
I already established what my 2 points are. Anything else we're arguing about (which is most of it) is stuff I didn't bring up.
For opinions to carry weight, the details matter. That's why I'm focusing on deails. I trust you know the difference between an informed and uninformed opinion? There can also be misinformed opinions and underinformed opinions. What I'm trying to do is help nail down the details, so that more people can hopefully hold more informed opinions (myself included).
Maybe if your interest extended beyond winning what you perceive as arguments, you'd see a different theme in my contributions to this thread. Maybe not. But, if you're only reading my replies to you, and if you view this as a zero-sum interaction, then it can't help but shade your perspective.
I haven't been reading your contributions to this thread other than the ones you've sent to me, because I don't care, and I'm not interested in inserting myself in the middle of someone else's debate. Contrary to what you might believe, I don't care about this topic that much in general. I only came here to say that Linus is overblowing the severity of this problem. I'm not even saying he's wrong.
I had not proposed to use it in the way you suggested. Nobody did. It looked to me like you setup that strawman and burned him down.
Like we keep telling you, the reactive approach puts your data at risk. I get that you're fine with the level of risk, but it's nonzero for sure.It's up to the individual to value their own time and data. I know that if my ECC RAM saves me from data loss, I'd indeed consider it a good return on investment.
As I said, it's like an insurance policy. Insurance doesn't always get claimed. But, when it does, it's usually much appreciated.
I apologize that it wasn't clear. Not everyone needs Kevlar attire, but we're all at some risk of being shot (in this analogy). I think we agree that the level of risk and exposure varies.
Comment
-
Originally posted by coder View PostThis is perhaps too trivial an example, but if we take the case of a document, what if the error occurred on a different page than where the user is editing?
If the document wasn't encrypted, then you would probably only see 1 character change, which statistically is not going to cause a major issue. Statistically speaking, you would write more typos than your RAM would cause.
Also, a bit-error can have a disproportionate impact, such as shifting a memory address or array index by a large amount. Usually, such changes would result in a program crash, but they could instead have the effect that part of the document goes missing or is replaced by a copy of some other part.
Finally, the very process of saving a document usually involves a number of copies and transformations, during any of which a memory error could corrupt what's eventually persisted to nonvolatile storage.
You seem to be contradicting your earlier position that "simple" computer users don't need ECC memory because errors are likely to occur in unimportant data, and therefore will go unnoticed.
Usually, bad RAM gets noticed only when it's so bad that it leads to program or OS instability. However, even before that point, it quite plausibly could've corrupted some of a user's data. I see it as a continuum, rather than the sort of sudden cliff that you suggest.
His basic scenario is as legitimate for them as anyone else. It's just a matter of how much content they're editing, how long it sits in RAM, and how susceptible it is to memory errors. Uncompressed image data is probably the most resilient to errors, while anything that's highly-structured is probably the least.
Comment
-
Originally posted by mdedetrich View Post
Actually in terms of components, ECC is only marginally more expensive than non ECC memory. You basically have an extra memory cell that stores parity data , which is in the ballpark of 10-15% of the cost.
The main issue (which is what Linus is actually complaining about) is that Intel artificially segments the market so that the motherboards and CPU's that properly support ECC memory are much more expensive (i.e. typically server CPU's) than consumer ones.
So in an ideal world where all motherboards/CPU's would already come with ECC support then you probably wouldn't be able to even get non ECC memory in midrange and higher systems (only your wallmart $200 laptop would probably ship memory without ECC).
Comment
-
Originally posted by waxhead View PostOk, but this is not quite how I have understood patrol scrub. Unlike what you correctly describes as a standard RAID consistency check, patrol scrub reads a block, verifies checksums, write the same block and verifies that there is no error. e.g. a READ/WRITE cycle. This is supposed to exercise all memory and catch single bit errors early so that one can offline a memory module before it goes completely bonkers.
One thing worth mentioning is that at least for AMD CPUs we have a lot of other ECC-ish reliability features running at all times on caches, data paths and other blocks. System memory and on-chip RAS features all report up through the Machine Check Architecture (MCA) subsystem.
Someone mentioned that Puget Systems was reporting that RAM was becoming more reliable - this surprised me a bit. It's possible that RAM is becoming more reliable on a per-bit or per-byte basis, but since we are also using ever-increasing amounts of memory my impression was that aggregate reliability was going down rather than up, and that ECC was becoming more important rather than less.
One interesting exercise is to put a bunch of CPUs and GPUs in a box then figure out how long they will run before the first memory error. I was horrified when i did the math for one of our early supercomputer prototypes, and had to check with our RAS architect to make sure I was doing the math correctly. It was something like 1/2 day.
Last point is that systems used primarily for graphics tend to be more tolerant of memory errors than systems used primarily for compute or other processing, simply because the human eye and visual system are so good at dealing with minor errors. Doesn't help with crashing but even there the nature of graphics code is that a lot of the critical code lives in CPU cache (which is ECC'ed on our products AFAIK) and so doesn't get read that often.Test signature
- Likes 5
Comment
-
Originally posted by coder View PostThough I doubt you're being serious, I disagree. Videogame consoles and video streaming devices seem to do alright without it, and it's difficult for me to see what positive impact it can have for them that would offset the downsides of its added cost.
The law is a very blunt instrument and undermines (potentially) intelligent decisions and tradeoffs made by designers and engineers. If anything, laws should focus on disclosure of a device's data integrity properties. It's not so crazy, if you consider we have energy-efficiency labeling requirements for automobiles and appliances, and we have nutrition labels on food. But, since politicians skew older, less technical, and tend to have non-engineering backgrounds, they'd probably manage to bollocks it up.
why make another stupod label no one follows if you can just ban non-ecc ram.
most people ignore the labels anyway.Phantom circuit Sequence Reducer Dyslexia
Comment
-
Originally posted by bridgman View PostECC was becoming more important rather than less.
many people have it now: https://www.phoronix.com/forums/foru...e5#post1229970
i already replaced 3 TV/monitors with HDMI2.0 devices to avoid this bug.
looks like the amd opensource driver is broken with HDMI1.0-1.4
why not develop a driver GUI ot options in the gnome control center for the users to set a fix for that problem.
i am very error tolerant and even replace a monitor if it has bugs and i do not blame it on the AMD gpu if the bug is in the monitor. but other peopel are not as error tolerant than i am.
means for other customers a gui would be nice to set such options.
also i have a question could amd produce gpu cards with ECC and 1-2 instances(instead of the full 16/32) of VM ? i found out that in a social engineering attack with games who have a trojan horse included shows that we should better run untrusted games in a VM.
now that the 6800/6900 is out can we now have radeon7/Vega 20/AMD Radeon Pro VII with full 4096 shaders ? because the apple privilege to sell the 4096 shader version should be over in time of the 6800/6900...
and another question: can we have a gpu card who does avoid patent payment for HDMI by only displayport ? and also avoid GDDR6 patent payment by DDR5 with infinity cache?
would be good to have low cost alternatives without any patent payment.Phantom circuit Sequence Reducer Dyslexia
Comment
-
Originally posted by Qaridarium View Postdo you remember the yellow bug i reported to you years ago?
many people have it now: https://www.phoronix.com/forums/foru...e5#post1229970
i already replaced 3 TV/monitors with HDMI2.0 devices to avoid this bug.
looks like the amd opensource driver is broken with HDMI1.0-1.4
why not develop a driver GUI ot options in the gnome control center for the users to set a fix for that problem.
i am very error tolerant and even replace a monitor if it has bugs and i do not blame it on the AMD gpu if the bug is in the monitor. but other peopel are not as error tolerant than i am. means for other customers a gui would be nice to set such options.
Originally posted by Qaridarium View Postalso i have a question could amd produce gpu cards with ECC and 1-2 instances(instead of the full 16/32) of VM ? i found out that in a social engineering attack with games who have a trojan horse included shows that we should better run untrusted games in a VM.
SR-IOV adds a fair amount of hardware cost so I'm not sure it is the best approach for consumer solutions though.
Originally posted by Qaridarium View Postnow that the 6800/6900 is out can we now have radeon7/Vega 20/AMD Radeon Pro VII with full 4096 shaders ? because the apple privilege to sell the 4096 shader version should be over in time of the 6800/6900...
and another question: can we have a gpu card who does avoid patent payment for HDMI by only displayport ? and also avoid GDDR6 patent payment by DDR5 with infinity cache? would be good to have low cost alternatives without any patent payment.
Dropping HDMI on some SKUs seems problematic - low end cards are the most likely to need HDMI, while the savings on anything but the least expensive cards seems too small to justify the costs of carrying another SKU. I don't *think* we would need to actually remove the logic from the chip, but if another chip was required that would be a non-starter for sure.
DDR5 + Infinity Cache is an interesting thought for low end products.Test signature
- Likes 1
Comment
-
Originally posted by piotrj3 View PostIt is not. Some companies like Gigabyte claim they support "ECC memory" on motherboards but it doesnt' support validation or correction on consumer grade motherboard. Which enhances further fragmentation. As far as I know only Asus does support proper validation/correction on motherboards and only on some of them, everyone else is "no". Asrock i think only claims ECC correction on those "pro" processors.
Comment
Comment