Announcement

Collapse
No announcement yet.

Linux EDAC Support For AMD's Great Horned Owl

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux EDAC Support For AMD's Great Horned Owl

    Phoronix: Linux EDAC Support For AMD's Great Horned Owl

    The latest Linux kernel patch is for supporting ECC error detection via the Error Detection And Correction (EDAC) code with AMD's Great Horned Owl...

    http://www.phoronix.com/scan.php?pag...eat-Horned-Owl

  • #2
    I hope this isn't a conflict, but Level One has some tests of a 1807B board using the Phoronix Test Suite: https://level1techs.com/article/sapp...-fpv5-review-0

    Comment


    • #3
      Originally posted by tsuru View Post
      I hope this isn't a conflict, but Level One has some tests of a 1807B board using the Phoronix Test Suite: https://level1techs.com/article/sapp...-fpv5-review-0
      I don't mind other people at all using PTS, though was surprised looking at that article that they really didn't compare the performance directly to anything.
      Michael Larabel
      http://www.michaellarabel.com/

      Comment


      • #4
        Wow I thought the EDAC code had stopped being maintained. After the last few large-scale tests were performed, it was proven that unless you had a bad memory module the odds of getting a double fault in ECC memory at sea level were something silly like 49 million years MTBF per 1Mbit*hour, and everyone stopped caring about testing any more.

        Comment


        • #5
          Originally posted by linuxgeex View Post
          Wow I thought the EDAC code had stopped being maintained. After the last few large-scale tests were performed, it was proven that unless you had a bad memory module the odds of getting a double fault in ECC memory at sea level were something silly like 49 million years MTBF per 1Mbit*hour, and everyone stopped caring about testing any more.
          EDAC is infrastructure for reading ECC errors reported by the hardware (not necessarily in RAM but also in PCIe or other system bus that support it), so the system can react or just log them somehow (as the facilities to do so in the UEFI may or may not be present and usable).

          It is probably one of the few reliable ways of actually testing if ECC ram is working at all, by doing shenanigans on the RAM modules (like covering some data traces) and checking through EDAC if errors are detected. Most "ECC checking" software only look for the registers in the processor to see if the ECC is enabled, but don't say if it is actually working or not.

          I don't understand why you are talking of the chance of double bitflip in ECC affecting EDAC development.

          Comment


          • #6
            Originally posted by linuxgeex View Post
            it was proven that unless you had a bad memory module the odds of getting a double fault in ECC memory at sea level were something silly like 49 million years MTBF per 1Mbit*hour, and everyone stopped caring about testing any more.
            You don't need a double bit flip to benefit from ECC/EDC... you only need a single bit flip to corrupt data.

            In a properly running system with error correction and a hardware (or software) scrubber you are normally detecting and fixing single bit flips sufficiently quickly that a double bit error almost never happens.

            Comment


            • #7
              Originally posted by bridgman View Post

              You don't need a double bit flip to benefit from ECC/EDC... you only need a single bit flip to corrupt data.

              In a properly running system with error correction and a hardware (or software) scrubber you are normally detecting and fixing single bit flips sufficiently quickly that a double bit error almost never happens.
              So it is a shame that Ryzen CPUs support ECC, but AM4 motherboards officially don't.

              Comment


              • #8
                Originally posted by linuxgeex View Post
                unless you had a bad memory module the odds of getting a double fault in ECC memory at sea level were something silly like 49 million years MTBF per 1Mbit*hour
                It is possible to provoke dual bit flips with Rowhammer, in less than 49 million years.

                Comment

                Working...
                X