Announcement

Collapse
No announcement yet.

Btrfs RAID 5/6 Code Found To Be Very Unsafe & Will Likely Require A Rewrite

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Brane215 View Post
    a MB or two of reserved kernel space in RAM, reserved for just such purpose, ofcourse with UPS on which one can rely
    Using RAM as a safe haven is the same stupidity you find in ZFS (that admittedly was designed with systems where ECC RAM was plentiful).

    The reason you said already. You need good RAM, probably more than a few MB of RAM and probably ECC, and an UPS or you risk random things to kill the array.

    Modern SSD wear is an issue only for businnesses now (that do write a ton of stuff in SSD caches in servers), for home uses they are usually going to outlast their useful service life anyway.

    Comment


    • 1. Why exactly do I NEED ECC RAM just for that ? If I have for example, box where I deem simple RAM to b eacceptable, why would I need ECC DRAM just for journal ?

      2. Even in such case of server with simple non-ECC RAM, why exactly couldn't I create control checksum fieds and/or ECC info that would enable me to detect and possibly correct an error.

      Comment


      • Originally posted by Brane215 View Post
        1. Why exactly do I NEED ECC RAM just for that ? If I have for example, box where I deem simple RAM to b eacceptable, why would I need ECC DRAM just for journal ?
        Same reason ZFS needs ECC. If a ram error corrupts your journal, your filesystem becomes unreliable.

        If the journal is on disk, protected by the filesystem's own checksumming, this issue does not exist.

        2. Even in such case of server with simple non-ECC RAM, why exactly couldn't I create control checksum fieds and/or ECC info that would enable me to detect and possibly correct an error.
        you need to also duplicate all the journal as checksums only tell you if something is wrong. And all this checking is an additional layer of performance penalty (big or small depending on how good you are in the implementation).

        If none did so (not even ZFS, that simply assumes ram is ECC and does not check ram), there must be a reason no?

        Comment


        • Originally posted by starshipeleven View Post
          Same reason ZFS needs ECC. If a ram error corrupts your journal, your filesystem becomes unreliable.

          If the journal is on disk, protected by the filesystem's own checksumming, this issue does not exist.
          That same method can be used in RAM. Furthermore, since journal needs not be big ( due its transient nature) i wouldn't need much more than permuted copy, so that I in the case of a bitflips on one of the chips I can stiill get orginal value back without too much play with syndromes etc for data recovery.

          you need to also duplicate all the journal as checksums only tell you if something is wrong. And all this checking is an additional layer of performance penalty (big or small depending on how good you are in the implementation).
          So ? What is cheaper- data structure permute and repeated write into that same RAM, possibly few pages away ( possibly twice) trough DDR3 channel at 8+GB/s ( * 2 for dual channel) or writing it to SSD at maybe 1-10% of that throughput ( and much greater latency)?

          Comment


          • Originally posted by Brane215 View Post
            That same method can be used in RAM. Furthermore, since journal needs not be big ( due its transient nature) i wouldn't need much more than permuted copy, so that I in the case of a bitflips on one of the chips I can stiill get orginal value back without too much play with syndromes etc for data recovery.
            ...
            So ? What is cheaper- data structure permute and repeated write into that same RAM, possibly few pages away ( possibly twice) trough DDR3 channel at 8+GB/s ( * 2 for dual channel) or writing it to SSD at maybe 1-10% of that throughput ( and much greater latency)?
            you can do the same with ram caching of an on-disk structure, btw.

            Comment


            • WRT to plentiful "ECC DRAM" I've just replaced on 8GB module with two 8GB ECC unbuffered on my friggin Atnlon 5350 and as far as I can see, it is working.
              Sticks were carrying moderate price premium ( €40+ vs €33+ for 1600 MHz model), but that was expected because of that extra chip, if nothing else.

              I'm not thinking about BTRFS, it was done just as an extra precaution.

              Comment


              • Originally posted by Brane215 View Post
                WRT to plentiful "ECC DRAM" I've just replaced on 8GB module with two 8GB ECC unbuffered on my friggin Atnlon 5350 and as far as I can see, it is working.
                Sticks were carrying moderate price premium ( €40+ vs €33+ for 1600 MHz model), but that was expected because of that extra chip, if nothing else.

                I'm not thinking about BTRFS, it was done just as an extra precaution.
                ECC ram needs mobo firmware support too. Is your mobo from ASUS and you can see ECC options in its BIOS? Other OEMs never enabled ECC support in their AMD boards (because they are asses, mostly).

                Otherwise it's not working as ECC, but as normal RAM.

                Comment


                • Originally posted by starshipeleven View Post
                  Same reason ZFS needs ECC. If a ram error corrupts your journal, your filesystem becomes unreliable.
                  Please, I have posted links several times before, but maybe you missed them. I post them again, and hope you stop spreading this false information. Or are you deliberately FUDing? This false information, is it ignorance, or is it on purpose? I have posted links for you several times before.
                  1) ZFS does not need ECC RAM. If you have corrupt RAM, ZFS will not corrupt your data:

                  "...OK. But what if your [corrupt] RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything....So if you’re running non-ECC RAM that turns out to be appallingly, Lovecraftianishly evil, ZFS will mitigate the damage, not amplify it...."

                  2) ZFS does not need huge amounts of RAM. ZFS runs fine on a Raspberry Pie with 256MB RAM:

                  Comment


                  • Originally posted by starshipeleven View Post
                    ECC ram needs mobo firmware support too. Is your mobo from ASUS and you can see ECC options in its BIOS? Other OEMs never enabled ECC support in their AMD boards (because they are asses, mostly).

                    Otherwise it's not working as ECC, but as normal RAM.
                    I know. Didn't have the time to check, but old memtest86+ (v4.20) sees it ass ECC and report it as being used. It misses the freq and type of RAM ( shows it as DDR2) though.

                    I have Asus AM1-M mobo, didn't have the time to check in BIOS. Will do later.

                    I bought also a few 5370s and other tidbits. Plan to play with BIOS on a couple of boards ( have Asus AM-I and MSI AM1M-S2H to play with). I have 5370 in server and 2650, 3850. 5350 and 5370 with a couple of 8GiB DDR3 1866 sticks to play with.

                    I did do some tidbits with coreboot, played with ACPI and assembly within FLASH.

                    Time to do something usefull with it )




                    Comment


                    • Originally posted by Brane215 View Post
                      I know. Didn't have the time to check, but old memtest86+ (v4.20) sees it ass ECC and report it as being used. It misses the freq and type of RAM ( shows it as DDR2) though.
                      I'd like to start with a little "ECC is a black box and also a damn bitch" rant.

                      That said, afaik memtest is unreliable as it only checks what kind of RAM that is, an engineer here with too much time in their hands managed to fool it by crossflashing or swapping or anyway hacking the SPI chip of the RAM banks to report ECC capability.

                      For intel stuff the software that seems to not be fooled by his attempts is Aida64 software, as I'm suspecting it dumps the hardware registers relevant to ECC functionality (there are some snippets of code to do the same on linux, for some intel hardware).

                      Maybe you can try that too and see what it says. Sadly it is windows-only.

                      I have Asus AM1-M mobo, didn't have the time to check in BIOS. Will do later.
                      Limited fragmented reports on a few threads online about possible ECC capability even if not stated anywhere.

                      I bought also a few 5370s and other tidbits. Plan to play with BIOS on a couple of boards ( have Asus AM-I and MSI AM1M-S2H to play with). I have 5370 in server and 2650, 3850. 5350 and 5370 with a couple of 8GiB DDR3 1866 sticks to play with.

                      I did do some tidbits with coreboot, played with ACPI and assembly within FLASH.

                      Time to do something usefull with it )
                      Allowing these boards to be free from UEFI crap, but mostly enabling ECC or other functionality that is present in hardware but disabled by stupid firmware is something I would approve.
                      You might want to do a kickstarter or something, I'd pay for that, and I'm not the only one.

                      Comment

                      Working...
                      X