Announcement

Collapse
No announcement yet.

Is ECC RAM worth it for a desktop PC?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DeiF
    started a topic Is ECC RAM worth it for a desktop PC?

    Is ECC RAM worth it for a desktop PC?

    I'm planning a big upgrade for a desktop PC which is 6 years old already.
    The thing I've found it was more limited was in RAM quantity (only 2GB).
    My most memory demanding tasks now require about 8GB of RAM (without the OS, programs, etc).
    Since I want to be future-proof, I've decided to max out whatever is the maximum RAM in the new system (32GB)
    I'm currently considering to buy an Asus Crosshair V Formula Z, and an AMD FX-8350 CPU.
    Since both components support ECC RAM, and my budget seems to be enough to buy it, I'm considering that option.

    But, is it really worth it?
    How common are bit-flip errors in modern RAM?
    Is the performance drop noticeable?


    ECC RAM availability in my zone is pretty rare, but I've found this kit that may work:
    KVR16E11K4/32 (4x8GB DDR3 ECC 1600Mhz CAS 11)

    It isn't listed in the motherboard docs thought, so it's a bit risky. I haven't contacted Asus yet.

  • torsionbar28
    replied
    Originally posted by brosis View Post
    I do not. Everything is CRC'ed or checksummed or running ECC. Caches, packets, you name it. Almost all discrete GPUs use ECC, you were probably fooled by Quadro - its software ECC and only for extreme freaks, since GDDR3+ have ECC functionality. CPU is through and through ECC'ed, I don't know about system buses - I suspect they are. Hard drives caches are ECC'ed since ... eternity. SATA has CRCing. Hard drive low level logic does CRC on sectors. Even FS are now providing reliable transparent methods of detecting bit rot, mistakenly corrected by HDD logic for example, and fix that. The only one left non ECCed/CRCed in desktop is RAM. And a lot of stuff is projected into it, so it is worth it.
    PCI-Express, SAS, SATA, Firewire, Fibre Channel, and others all use 8/10b encoding, which has the inherent ability to detect single-bit errors. When you add CRC checking on top of that, it makes for an exceptionally high signalling fidelity. Data corruption due to signalling errors are effectively non-existant on these interfaces.

    System RAM is the weakest link.

    Another point to consider on "is it worth it" is your usage pattern. If you reboot daily, or multiple times per week (like Microsoft OS's tend to do) then the error probability is reduced, since memory is wiped at power off or reboot. If you keep your system running 24/7 and rarely reboot, as many (most?) Linux desktop users tend to do, then it becomes more of an issue, since a potential memory error remains resident for an extended period of time.

    Obviously in a server (non-Microsoft) scenario, where the OS may be up and running for months or even years at a time between reboots, the probability of a memory error is quite high, which explains why ECC memory is the defacto standard in every server, regardless of vendor.

    Leave a comment:


  • JS987
    replied
    Originally posted by JS987 View Post
    there is too limited offer of fanless 28nm discrete GPUs.
    Only 28nm fanless card local store is selling is Radeon 7750 which is overkill producing too much heat. That card wouldn't also fit into my current PC because of dual slot cooler.
    I'm watching available graphics card long time. There was always problem with fanless cards which usually have outdated GPU, slow memory or dual slot cooler.

    Leave a comment:


  • brosis
    replied
    Originally posted by EvilGuru View Post
    I seriously question the value of ECC memory. This is not because I believe bit-errors (and more importantly their detection) are unimportant but rather because there are simply so many other sources of error on a modern system.

    Properly evaluating the utility or necessity of ECC DRAM requires one to adopt a systems approach. First, can you list all sources of DRAM in your system? Every from the main memory to DRAM chips on hard disks and ethernet controllers. Secondly, of those chips how many of them are either ECC or utilise some kind of software pairty checking? Chances are at this point you'll find that your GPU (unless it is a workstation class card) has non-ECC memory.

    Next, can you tell me the error rates for all disks in your system? What about network connections? Checksums are not as strong as many like to make out. When SCP'ing large quantities of data over a local network it is not usual (often because of a bad cable) for errors to occur. These errors -- in my experience -- are usually picked up at the application level by SCP having first passed the Ethernet, IP and TCP checksums.

    If you're happy with all of that then it is time to move onto internal interconnects. Take moving data across the PCI(-e) bus as an example. Or between a S-ATA drive and the host controller. What is the error rate here?

    Finally, lets look at the CPU. What is the error rate (as in miscomputation of a result) here? Sure, it shouldn't happen, but still, for a meaningful evaluation of ECC memory to be made a rate is necessary.

    If this is still not enough lets go for the elephant in the room. Hardware and software bugs. I suspect these are orders of magnitude more prevalent than any of the hardware issues outlined above.

    Regards, Freddie.
    I do not. Everything is CRC'ed or checksummed or running ECC. Caches, packets, you name it. Almost all discrete GPUs use ECC, you were probably fooled by Quadro - its software ECC and only for extreme freaks, since GDDR3+ have ECC functionality. CPU is through and through ECC'ed, I don't know about system buses - I suspect they are. Hard drives caches are ECC'ed since ... eternity. SATA has CRCing. Hard drive low level logic does CRC on sectors. Even FS are now providing reliable transparent methods of detecting bit rot, mistakenly corrected by HDD logic for example, and fix that. The only one left non ECCed/CRCed in desktop is RAM. And a lot of stuff is projected into it, so it is worth it.
    Last edited by brosis; 09-24-2013, 01:58 PM.

    Leave a comment:


  • JS987
    replied
    I would buy PC with ECC RAM, but cheapest CPU (Xeon 1225v3) costs about 210 Euro and cheapest motherboard (Asus P9D WS) costs 200 Euro.
    That CPU become outdated after about 3 years when integrated GPU become outdated because of driver support.
    Intel doesn't sell discrete GPUs. Open source drivers for AMD discrete GPUs aren't usable and there is too limited offer of fanless 28nm discrete GPUs.

    Leave a comment:


  • EvilGuru
    replied
    I seriously question the value of ECC memory. This is not because I believe bit-errors (and more importantly their detection) are unimportant but rather because there are simply so many other sources of error on a modern system.

    Properly evaluating the utility or necessity of ECC DRAM requires one to adopt a systems approach. First, can you list all sources of DRAM in your system? Every from the main memory to DRAM chips on hard disks and ethernet controllers. Secondly, of those chips how many of them are either ECC or utilise some kind of software pairty checking? Chances are at this point you'll find that your GPU (unless it is a workstation class card) has non-ECC memory.

    Next, can you tell me the error rates for all disks in your system? What about network connections? Checksums are not as strong as many like to make out. When SCP'ing large quantities of data over a local network it is not usual (often because of a bad cable) for errors to occur. These errors -- in my experience -- are usually picked up at the application level by SCP having first passed the Ethernet, IP and TCP checksums.

    If you're happy with all of that then it is time to move onto internal interconnects. Take moving data across the PCI(-e) bus as an example. Or between a S-ATA drive and the host controller. What is the error rate here?

    Finally, lets look at the CPU. What is the error rate (as in miscomputation of a result) here? Sure, it shouldn't happen, but still, for a meaningful evaluation of ECC memory to be made a rate is necessary.

    If this is still not enough lets go for the elephant in the room. Hardware and software bugs. I suspect these are orders of magnitude more prevalent than any of the hardware issues outlined above.

    Regards, Freddie.

    Leave a comment:


  • Kayden
    replied
    If your motherboard supports ECC, absolutely go for it. I used to use it on my desktop, and it was rock solid...never had a single problem with it. It's a simple thing that makes your computer more robust and less likely to go haywire.

    These days, I don't use ECC just because most systems don't seem to have support for it. If they did, I'd switch back in a heartbeat.

    Leave a comment:


  • Abousid
    replied
    Is your desktop computer that mission critical?
    ECC is mainly used for high available mission critical systems.

    Leave a comment:


  • RealNC
    replied
    It turns out that non-ECC RAM is actually a security risk, as bit flips can be exploited. "Bit-squatting" from Black Hat 2011:

    http://www.youtube.com/watch?v=_si0FYl_IOA

    Leave a comment:


  • erendorn
    replied
    Originally posted by torsionbar28 View Post
    All but the cheapest consumer-grade RAID controllers do integrity checking. Even the Linux kernel software RAID runs regular consistency checks. If the kernel software raid cannot read a block from one disk, it will remap that block to a new location, and reconstruct it from the other RAID members. You can also force a manual consistency check at any time, with "echo check >> /sys/block/md0/sync_action" assuming md0 is your array.

    You're correct though in that it doesn't detect disk errors on the fly (unless its accompanied by a read error), only during the regularly scheduled consistency check.
    Indeed, that doesn't protect you against bitflips, which are not read/write errors but incorrect read/writes.
    Chechsumming helps against that, e.g. btrfs uses checksums in metadata, so it can recover from bitflip in RAID configuration.

    Leave a comment:

Working...
X