Announcement

Collapse
No announcement yet.

New Ryzen Is Running Solid Under Linux, No Compiler Segmentation Fault Issue

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by leipero View Post

    I doubt that's CPU bug at all, more likely manufacturing problem, it's the same stepping.
    A manufacturing problem is still a CPU bug as it's a problem with the physical CPU. Perhaps you meant it's not an architecture bug?

    Comment


    • #12
      Originally posted by Michael View Post
      I haven't been told anything from AMD about the possibility of a microcode fix. If affected by the issue they recommend contacting AMD Customer Care, which at least for now sounds like an RMA.
      I've been through the process. You contact customer care, they ask you some questions about your system (cooling, voltages, BIOS settings), let you try higher voltages and then hand you over to RMA.

      Originally posted by leipero View Post
      I doubt that's CPU bug at all, more likely manufacturing problem, it's the same stepping.
      Maybe it is a new stepping, but not publicly labelled? They seem to have fixed it but haven't said yet what they fixed.
      What else would it be?

      Comment


      • #13
        I was getting random segfaults but then I reduced the workload on the machine and underclocked the memory a bit. Guess I should plan to do the RMA...

        Comment


        • #14
          Was there any change in CPU stepping or something else in the cpuid?

          Comment


          • #15
            I've got my 1700 overclocked to 3.6ghz on stock voltage, and I got mine at launch. How long should it take for kill-ryzen.sh to find something? It's been running clean for almost 20 minutes now with no errors. Is it possible to have an early model with no errors... or does it just take a lot longer for it to show up. And in that case, if I'm not doing insanely heavy compilation workloads, how much should I worry.

            Can this error manifest itself in other non-seg-fault ways? Should I worry about invalid bits if I'm doing video editing? Do I have to worry about silent data corruption? Or if i'm not seeing the issue via kill-ryzen in a reasonable time, should I just rest easy that it wont affect me in a normal workflow. (I've been using this CPU for the past 5 months or whatever now, im more concerned about silent data corruption than having a random seg fault in a compilation that I can just restart)

            Comment


            • #16
              Originally posted by bisby View Post
              I've got my 1700 overclocked to 3.6ghz on stock voltage, and I got mine at launch. How long should it take for kill-ryzen.sh to find something? It's been running clean for almost 20 minutes now with no errors.
              You should run kill-ryzen.sh for at least 24hs, maybe 48hs.


              I have seen problem showing up after a few minutes up to 5 hours. Some people reported longer times.

              Comment


              • #17
                Originally posted by pjssilva View Post

                You should run kill-ryzen.sh for at least 24hs, maybe 48hs.


                I have seen problem showing up after a few minutes up to 5 hours. Some people reported longer times.
                I'm at 45 minutes now. The real question I guess i have is: is this just a lucky run where the issue could show up after 15 seconds on a quick compile? I compile a lot of stuff from the AUR, but i'm not using gentoo or something. or if it doesnt show up after an hour or so, will it generally not affect day to day operations.

                Does this affect things other than compilation, or is compilation just the obvious and most reliable way to get a visible response.

                Comment


                • #18
                  Under normal Linux desktop workloads, gaming, etc, all Ryzen processors should work just fine.
                  Not really. There is also a somewhat widespread freeze on idle problem, also known as mce freeze / reboot. CPU can just freeze during casual load. It's known to be caused by C states, and disabling them is a crude workaround. But it seems to be a hardware issue.

                  Comment


                  • #19
                    Originally posted by shmerl View Post
                    Not really. There is also a somewhat widespread freeze on idle problem, also known as mce freeze / reboot. CPU can just freeze during casual load. It's known to be caused by C states, and disabling them is a crude workaround. But it seems to be a hardware issue.
                    AFAIK we believe that is an unrelated issue, but I will confirm.
                    Test signature

                    Comment


                    • #20
                      Originally posted by bridgman View Post

                      AFAIK we believe that is an unrelated issue, but I will confirm.
                      Thanks. I don't think it's related indeed, it's just another problem that's rather annoying, since random reboots can actually cause data loss and are quite disruptive. There were reports of Ryzen CPUs with segfaults without mce, and with mce without segfaults, and both combined.

                      After that happens, you can observe something like this in syslog:

                      Code:
                      Aug 13 05:20:14 host kernel: [    0.004000] mce: [Hardware Error]: Machine check events logged
                      Aug 13 05:20:14 host kernel: [    0.172031] mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
                      Aug 13 05:20:14 host kernel: [    0.172035] mce: [Hardware Error]: TSC 0 ADDR 1ffff84c4a3bc MISC d012000101000000 SYND 4d000000 IPID 500b000000000
                      Aug 13 05:20:14 host kernel: [    0.172038] mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1502615988 SOCKET 0 APIC 2 microcode 8001126
                      This is quite extensively discussed in the big thread on AMD forum, which also talks about segfaults issue and in another smaller thread about mce specifically.

                      See some here: https://community.amd.com/thread/215...t=975&tstart=0
                      And here: https://community.amd.com/message/2813430
                      Last edited by shmerl; 25 August 2017, 12:17 AM.

                      Comment

                      Working...
                      X