Announcement

Collapse
No announcement yet.

AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by bridgman View Post

    There is an errata document but I'm not sure if it has been published yet. I asked about status last week.



    IIRC a typical modern CPU has somewhere between 10 and 100 errata (check the revision guides for any recent CPU).

    I do not expect this specific one to cause problems on Linux or Windows but as I said we are checking to make sure that transparent huge page logic (THP) in Linux does not need an additional tweak.
    How did AMD come to the conclusion that ThreadRipper isn't affected by these issues given that it's running the same stepping as the Ryzen chips (B1). Does it just have better QA in general? A manufacturing fix on 2017 Week 25+ chips? A microcode fix?

    Sounds to me that there's no Microcode Fix (if ThreadRipper has it, pretty sure they Ryzen would have it by now as well), From a die perspective - there's no new stepping (TR is supposedly using the same me B1 Zeppelin Dies used in Ryzen). Sounds like a lot of voodoo BS coming out of AMD - how are they able to claim that TR is unaffected if Linux is still being looked at?

    Will the announced Ryzen Pro desktop processors have the same level of QA-binning as TR? Will the consumer line as well? Or will we play ongoing SEGV Silicon Lottery going forward on the consumer line?

    AMD needs to give us a better explanation and fast-track RMA process for people affected instead of making us take pictures of our BIOS settings, our Case - (a bit pissed as I have two systems affected, and AMD support takes several days to respond to an ongoing support conversation).

    A fast-track path IMO would be, make us set BIOS to defaults (to rule out overclocking), provide us a testing tool (something we can execute) and if it fails, generate an RMA number. The way it's happening now, it'll take over a week before one gets an RMA number because they want pictures of the BIOS, pictures of the case and AMD support doesn't respond quickly enough. Add shipping into account, this process ends up taking a couple of weeks (crossing ones fingers the first CPU sent back will be a good copy).
    Last edited by Funks; 08-13-2017, 03:30 AM.

    Comment


    • I've signed up because this is the only place where AMD employees get involved in the conversation about Ryzen.

      When will AMD make a statement about these segfaults? What is AMD investigating? How can AMD be sure the B1 Threadripper is unaffected? What's wrong with the chips which fail with SEGV? Does AMD believe we are going to buy any Ryzen chip after this fiasco?

      When can we expect an actual answer? When can we expect AMD to get involved to answer questions?

      Comment


      • Originally posted by Funks View Post

        How did AMD come to the conclusion that ThreadRipper isn't affected by these issues given that it's running the same stepping as the Ryzen chips (B1). Does it just have better QA in general? A manufacturing fix on 2017 Week 25+ chips? A microcode fix?

        Sounds to me that there's no Microcode Fix (if ThreadRipper has it, pretty sure they Ryzen would have it by now as well), From a die perspective - there's no new stepping (TR is supposedly using the same me B1 Zeppelin Dies used in Ryzen). Sounds like a lot of voodoo BS coming out of AMD - how are they able to claim that TR is unaffected if Linux is still being looked at?

        Will the announced Ryzen Pro desktop processors have the same level of QA-binning as TR? Will the consumer line as well? Or will we play ongoing SEGV Silicon Lottery going forward on the consumer line?

        AMD needs to give us a better explanation and fast-track RMA process for people affected instead of making us take pictures of our BIOS settings, our Case - (a bit pissed as I have two systems affected, and AMD support takes several days to respond to an ongoing support conversation).

        A fast-track path IMO would be, make us set BIOS to defaults (to rule out overclocking), provide us a testing tool (something we can execute) and if it fails, generate an RMA number. The way it's happening now, it'll take over a week before one gets an RMA number because they want pictures of the BIOS, pictures of the case and AMD support doesn't respond quickly enough. Add shipping into account, this process ends up taking a couple of weeks (crossing ones fingers the first CPU sent back will be a good copy).
        They ask me also to do some pictures of my mainboard and settings. This is all very sad.

        Comment


        • When can we expect a response from AMD related to the hardware issues Ryzen B1 has?

          Comment


          • Perhaps a new AGESA coming soon and perhaps also fix OpCache segfault in heavy workloads /compilations ? .... wait and see !
            http://forum.gigabyte.us/thread/1542...&scrollTo=9927

            yesterday at 12:14am GIGABYTE - Matt said:
            Quick update for everyone on the next round of BETA BIOS. I am pushing to get another round of BIOS out... But it seems a new AGESA code is coming soon. I believe we will have one more round of BETA BIOS using 1006A before the new AGESA is implemented (AMD has not yet released it... Not sure I am supposed to mention it, but ya its coming)
            Last edited by scorpio810; 08-13-2017, 03:37 PM.

            Comment


            • Originally posted by keantoken View Post
              I'm running kill-ryzen.sh and get this:

              [loop-6] Fri Aug 11 21:07:10 CDT 2017 start 0
              [loop-7] Fri Aug 11 21:07:11 CDT 2017 start 0
              [loop-6] Fri Aug 11 21:08:57 CDT 2017 build failed
              [loop-6] TIME TO FAIL: 113 s
              [loop-4] Fri Aug 11 21:09:14 CDT 2017 build failed
              [loop-2] Fri Aug 11 21:09:14 CDT 2017 build failed
              [loop-4] TIME TO FAIL: 130 s
              [loop-2] TIME TO FAIL: 130 s

              So perhaps my Ryzen 5 X1500 is not immune to the problem. I will try disabling opcache if I can figure out which is the correct option in the BIOS.

              On starting the script it took up about 2GB of memory and started 8 threads. Memory is still not maxed out and I have Chromium running right now. Do I really need 16GB to run this? I only have 8GB.

              BTW when my kernel build failed it also took 2 Chromium tabs with it, so maybe in that case it was more system instability issues.
              Check dmesg for OOM killer messages. I don't think 8Gb of memory is sufficient for 8 threads, especially if you also have Chromium running at the same time.

              Comment


              • Originally posted by k1e0x View Post
                heh, I think you guys need to resize how small of a subset you are. Linux/BSD users that are compiling their systems from scratch that also own ryzens.. what 0.001% of the computer industry or less?
                Good luck penetrating the server market with that attitude.

                Comment


                • Originally posted by Enverex View Post
                  Good luck penetrating the server market with that attitude.
                  Unfortunately there aren't any socket AM4 or TR4 server mainboards on the market. Only gaming mainboards :-(

                  Comment


                  • Originally posted by drSeehas View Post
                    Unfortunately there aren't any socket AM4 or TR4 server mainboards on the market. Only gaming mainboards :-(
                    Yeah. Gaming motherboard with TR4 socket is an absolute nonsense. I expected Treadripper to be primarily used as a workstation CPU. As a gaming CPU it is needed only for SLI/Crossfire configurations with >2 GPUs.

                    Comment


                    • Originally posted by puleglot View Post
                      Check dmesg for OOM killer messages. I don't think 8Gb of memory is sufficient for 8 threads, especially if you also have Chromium running at the same time.
                      I am the author of the kill-ryzen script (already regret that name).

                      Yes, 8 Gb of RAM is not sufficient to successfully complete the builds.
                      The output of "journalctl -kf", which is appended to the printed log.
                      It should also show the OOM killer messages, dosn't it?
                      Some segfaults are caught by the programs itself and not by the kernel.
                      Hence, these don't appear in the printed log.
                      You can always go the build directories and look at the end of the individual log files for the actual reason.
                      Look through the "build.log" files in the corresponding subfolders of "/mnt/ramdisk/"

                      It is not impossible but unlikely that this was already due to hitting the memory limit.

                      Comment

                      Working...
                      X