Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    ... and maybe leave out the build-php test to avoid "red herring" segfaults from conftest ?
    Test signature

    Comment


    • #52
      Michael,

      Have you tried this on Intel hardware, by chance? And/or a 32-bit system? Not a full hour-long stress test - I wouldn't think you'd need that. A few passes of each, at most?

      I think the conftest segfaulting has been around for a long time. See, e.g.,
      https://forums.gentoo.org/viewtopic-t-682508.html
      http://www.linuxquestions.org/questi...de-4175536308/ (this one is more informative in some ways)

      Comment


      • #53
        I just tried it on an Intel 2600K at the office... same segfault messages.

        Leaving out the build-php test case makes the messages stop happening.
        Test signature

        Comment


        • #54
          I run PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-apache build-imagemagick and I saw no error or segfauts here with 1.72 beta BIOS and OpCache disabled.

          EDIT :

          Code:
          [ 2796.157487] conftest[10720]: segfault at 1c ip 0000000100000822 sp 00007fffffffda50 error 6 in conftest[100000000+1000]
          [ 2796.176485] conftest[10727]: segfault at 1c ip 0000000100000822 sp 00007fffffffda50 error 6 in conftest[100000000+1000]
          [ 2920.112315] perf: interrupt took too long (2515 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
          scorpio810
          Junior Member
          Last edited by scorpio810; 05 August 2017, 03:58 PM.

          Comment


          • #55
            Originally posted by Naib View Post
            This looks like a redherring for causing such an issue. Looking at the original article there was only conftest segfaults I still think a number of people having these issues have poor setups: bios/ram, toolchain config etc... There maybe a bug somewhere but so far....
            No, your opinion is wrong.

            I have seen several opinions like this, in several forums, that maybe all those who report about this Ryzen bug are morons who do not know how to configure their computers.

            Not only is this opinion naive, but it is very likely that most, if not all, Ryzen processors sold at the beginning have this bug, but their owners are not aware of this fact because they did not test them for a long enough time.

            I have assembled and configured thousands of custom computers, since the days of Intel 8080 & Motorola 6800, until the latest Kaby Lake & Ryzen.
            I have also designed and debugged many embedded computers so there is no doubt that I know if the components that I have used for my Ryzen computer are of adequate quality and if the BIOS and operating system are OK.

            I have pre-ordered a Ryzen 7 1800X. I have used it with the best ASRock MB, with 32 GB DDR4-2400 ECC memory (ECC works OK in this MB), with a Noctua cooler that ensured low temperatures for the CPU and with a Titanium power supply with excellent noise and regulation and with excess capacity.

            I have applied all BIOS updates. The last one was about 2 weeks ago.

            I am a Gentoo user, so I performed a lot of compilations on the Ryzen system.


            Initially I believed that I am lucky, because this Ryzen seemed to work perfectly and its performances in everything I have tried were excellent.

            Unfortunately the truth was that I have not tried hard enough to expose this bug.

            One week ago, I have let Ryzen to compile continuously a whole day several packages that are known to expose the bug, e.g. mesa and gcc.
            The compilations used 16 threads and several gcc compiler versions were used.

            During one day, two faults were recorded. Both were due to hardware, not to software, because repeating those operations, exactly, resulted in no fault.

            Both faults happened in bash, while bash executed some scripts during the compilation of gcc, scripts that generate some files used for building gcc.

            One fault was an illegal instruction fault, supposedly in a glibc function, while the other was a memory page fault.


            What I have seen matches perfectly the reports of others, except that on my system the faults were less frequent. That is why I had not noticed them previously in less stressful usage.

            It is likely that all those who have not seen the bug yet are those who have Ryzen samples like mine, where you need to perform much longer tests to see the faults.


            The really bad thing is that for every error that is exposed as an illegal instruction fault or as a memory page fault, there must exist other similar errors that resulted in a the execution of a legal, but incorrect instruction, which may cause silent data corruption.


            Immediately after its launch, Skylake also had a large number of dangerous bugs, which are described in its errata document with vague words about some unspecified sequences of instructions that can cause unpredictable behaviors leading to crashes or data corruption. Intel, however, patched quickly those bugs with microcode updates, so most users have never experienced them, with the exception a few publicized cases, e.g. the one encountered in SETI @ home, or the recent one at OCaml.


            Therefore the problem is not that Ryzen has one more bug, but the problem is that we do not know if the AMD management gives to this bug the high priority that it deserves.

            Unless AMD finds a workaround for this bug, I will have to dump to the garbage my Ryzen and its motherboard, because I would not use it for gaming, but for some work where I cannot use a processor that I cannot trust to not corrupt my data.


            Nevertheless, even if I would loose all that money, I would still be willing to buy some Epyc and/or Threadripper processors, because they have significant advantages for certain purposes, but only if AMD would make public some information about this bug that would convince me that they understand well its cause and that would give some credible guarantees that the future AMD processors will be free of this bug.


            Someone on this forum reported that long tests on some Epyc showed no faults. That is encouraging, but I wonder if that is due to Epyc having some die revision free of this bug, or just due to the fact that all Epyc processors have much lower clock frequency and much lower electrical power per core. If the latter would be the cause, then Threadripper would still have this bug.


            Comment


            • #56
              Looks to me that Michael got paid by Intel to FUD the AMD. He won't run this on Intel, he won't run it on Epyc, he won't test opcache disabled and see what is performance when opcache is disabled. Just one big FUD fart.

              Comment


              • #57
                Originally posted by gnufreex View Post
                Looks to me that Michael got paid by Intel to FUD the AMD. He won't run this on Intel, he won't run it on Epyc, he won't test opcache disabled and see what is performance when opcache is disabled. Just one big FUD fart.
                Jesus dude... someone making mistake does not mean they are being paid by Intel. I doubt most of the people posting here knew exactly what a conftest segfault meant until recently. If they did they probably would have pointed it out yesterday. That includes the AMD people posting here, and if they did know they should have spoke up or paid more attention. It would have resolved this a bit sooner.

                Comment


                • #58
                  Originally posted by gnufreex View Post
                  Looks to me that Michael got paid by Intel to FUD the AMD.
                  Have you ever heard about a Hanlon's razor?

                  Comment


                  • #59
                    Originally posted by gnufreex View Post
                    Looks to me that Michael got paid by Intel to FUD the AMD. He won't run this on Intel, he won't run it on Epyc, he won't test opcache disabled and see what is performance when opcache is disabled. Just one big FUD fart.
                    Do not blame Michael ! Beta Bios with Opcache option just released two days ago and nobody knows what its bios solves !
                    Reply #7 on: 04-August-17, 23:06:11 ยป
                    Quote from: scorpio on 04-August-17, 22:52:24
                    @darkhawk : MSI could don't mask OpCache control in their Ryzen bios, please?

                    See : >>How to contact MSI.<<
                    https://forum-en.msi.com/index.php?t...5.0#msg1642536

                    Comment


                    • #60
                      Originally posted by gnufreex View Post
                      Looks to me that Michael got paid by Intel to FUD the AMD. He won't run this on Intel, he won't run it on Epyc, he won't test opcache disabled and see what is performance when opcache is disabled. Just one big FUD fart.
                      lol..

                      Comment

                      Working...
                      X