Announcement

Collapse
No announcement yet.

AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by puleglot View Post

    Yeah. Gaming motherboard with TR4 socket is an absolute nonsense. I expected Treadripper to be primarily used as a workstation CPU. As a gaming CPU it is needed only for SLI/Crossfire configurations with >2 GPUs.
    Watch out for operator segfaults with this treadripper:


    Comment


    • Originally posted by Funks View Post
      How did AMD come to the conclusion that ThreadRipper isn't affected by these issues given that it's running the same stepping as the Ryzen chips (B1). Does it just have better QA in general? A manufacturing fix on 2017 Week 25+ chips? A microcode fix?
      Originally posted by tetsuos View Post
      How can AMD be sure the B1 Threadripper is unaffected? What's wrong with the chips which fail with SEGV?
      In my understanding they now know how to test for this problem, and so will use this information when binning the chips. This means all CPUs which were factory tested after certain date (and that includes all Threadripper and Epyc according to AMD) will not show this problem.

      No other fix has been applied from what I can tell.

      Originally posted by tetsuos View Post
      Does AMD believe we are going to buy any Ryzen chip after this fiasco?
      What will you be buying instead? If you expect bug-free chips, then you cannot buy any at all. Remember that Intel too was not publicly reacting for half a year about the Skylake HT bug, and then silently fixed it in a microcode update.

      I think AMD could definitely have handled this better (and I mean far better), but what they did is more or less par for the course.

      Comment


      • suaefar

        Originally posted by suaefar View Post

        Yes, 8 Gb of RAM is not sufficient to successfully complete the builds.
        The output of "journalctl -kf", which is appended to the printed log.
        It should also show the OOM killer messages, dosn't it?
        Some segfaults are caught by the programs itself and not by the kernel.
        Hence, these don't appear in the printed log.
        You can always go the build directories and look at the end of the individual log files for the actual reason.
        Look through the "build.log" files in the corresponding subfolders of "/mnt/ramdisk/"

        It is not impossible but unlikely that this was already due to hitting the memory limit.
        I did not see any messages like that. The script reported that journaling was disabled. I will run it again with sudo. Every time I run it, if I try to run it again It says:

        tee: /sys/block/zram0/disksize: Device or resource busy

        Not knowing what to do about that, I restart the system to fix it.

        Comment


        • I can't use dmesg because it is completely logged up with input device errors.

          I'm running the script with sudo. After several minutes the system memory usage was only 4GB with nothing else running. So far one segfault amidst the noise of evbugs:

          [loop-5] Mon Aug 14 12:19:55 CDT 2017 start 0
          [loop-5] Mon Aug 14 12:22:00 CDT 2017 build failed
          [loop-5] TIME TO FAIL: 130 s
          [KERN] Aug 14 12:22:00 ronin kernel: bash[6763]: segfault at 7fa18822d7e8 ip 00007fa187f49330 sp 00007fff63fa3eb8 error 4 in libc-2.24.so[7fa187e20000+193000]

          Comment


          • Originally posted by janweb View Post

            They ask me also to do some pictures of my mainboard and settings. This is all very sad.
            They'll then take several more days to get back to you, telling you to up the CORE voltage starting at 1.3625 -> 1.425 testing increasing it in increments of .05V and setting the SOC to 1.1. I personally don't understand why they are making end users tweak anything in the BIOS at default settings (not overclocking).

            Didn't buy this system to overclock and muck around with voltages manually - BS.

            Comment


            • Originally posted by keantoken View Post
              suaefar


              Not knowing what to do about that, I restart the system to fix it.
              That is correct. The script does not clean up and is not thought to be run twice.
              Everything is saved in a ramdisk which disappears upon restarting.

              The error you got there in your other post look very much like the errors that I observe.

              Comment


              • If there is a retailer selling the newer chips, maybe it would be faster to return the defective one to the store and buy the new one?

                Comment


                • Originally posted by Funks View Post

                  They'll then take several more days to get back to you, telling you to up the CORE voltage starting at 1.3625 -> 1.425 testing increasing it in increments of .05V and setting the SOC to 1.1. I personally don't understand why they are making end users tweak anything in the BIOS at default settings (not overclocking).

                  Didn't buy this system to overclock and muck around with voltages manually - BS.
                  Just get the first reply from AMD. Now it's my turn to send pictures.

                  Comment


                  • Is there some conclusion about this thing?

                    Comment


                    • Originally posted by misp View Post
                      Is there some conclusion about this thing?
                      Yeah, RMA your defective product if you have one.

                      Comment

                      Working...
                      X