Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • For what it's worth, I've just compiled yesterday a whole kernel, mesa-git, scribus, wine, several dozen small packages, ffmpeg and vlc, on my Ryzen 1700 with the MSI Tomahawk B350 platform and 16GB of RAM. I bought GSkill RAM which do not run above 2133Ghz right now, but I'm fine with that.
    No segfault at all. Haven't had a segfault compiling since I bought the equipment (almost at Ryzen's release, I was an early adopter). I had a few hangs at first that were solved by BIOS upgrades (I'm not even running the latest BIOS right now).
    I'm running a slackware64-current system, and compile with the following options : "-O3 -fPIC -march=znver1 -mtune=znver1"
    I use -j17 with make so I tend to use all cores extensively.
    I also transcode a lot of 20+ GB videos files with 16 threads and never had an issue, even when the operation took 2+ hours. So it doesn't seem to be a stressing issue.
    So my guess is :
    - This seems specifically related to compilers
    - It's triggered only in specific situations and in particular setups.
    Which could be software or hardware-related, as far as I know.
    It's good that Michael can reproduce so easily, it will be easier to debug for the devs.

    Comment


    • I think it is good to discuss the issues openly in a tech forum as it will help to nail down the root causes.
      After idle on my Ryzen 7 the screen always keeps black and I can only reset the machine... but so far I blame a combination of Plasma-Wayland and/or AMDGPU drivers, didn't find the time to follow up on this.
      I had several freezes during gaming/desktop usage as well, but never thought about my Ryzen being maybe the issue.
      Definitely now I will start testing as well. So therefore I appreciate news stories and discussions like this!

      Comment


      • What strikes me is that people are so eager to blame the messengers? Why is that?

        As for myself, I'm puposefully sitting out the first 6-9 months of RyZen because designing, building and debugging a new CPU + platform is tricky business best left to people who do such things day in, day out, year after year. You'll note that one of the guys doing the BSD research on this (Matt Dillon of DragonFly BSD) is a long-time BSD-family kernel developer who has previously documented an AMD BD bug if I'm not mistaken.

        IF there is one or more errata related to RyZen (whether it's in hardware, microcode, motherboard implementation or whatever) THEN it is obviously imperative that a) a solid repro is found and b) it is properly investigated and documented by AMD and/or board partners and c) official errata is issued along with the proper fixes.

        In the meantime, this thread reads more like the script to a (witch-hunt?) soap opera than a properly researched documentary.
        Last edited by ermo; 06 August 2017, 08:34 AM. Reason: nitpicks

        Comment


        • Originally posted by rvdboom View Post
          For what it's worth, I've just compiled yesterday a whole kernel, mesa-git, scribus, wine, several dozen small packages, ffmpeg and vlc, on my Ryzen 1700 with the MSI Tomahawk B350 platform and 16GB of RAM. I bought GSkill RAM which do not run above 2133Ghz right now, but I'm fine with that.
          No segfault at all. Haven't had a segfault compiling since I bought the equipment (almost at Ryzen's release, I was an early adopter). I had a few hangs at first that were solved by BIOS upgrades (I'm not even running the latest BIOS right now).
          I'm running a slackware64-current system, and compile with the following options : "-O3 -fPIC -march=znver1 -mtune=znver1"
          I use -j17 with make so I tend to use all cores extensively.
          I also transcode a lot of 20+ GB videos files with 16 threads and never had an issue, even when the operation took 2+ hours. So it doesn't seem to be a stressing issue.
          So my guess is :
          - This seems specifically related to compilers
          - It's triggered only in specific situations and in particular setups.
          Which could be software or hardware-related, as far as I know.
          It's good that Michael can reproduce so easily, it will be easier to debug for the devs.
          BUT HE CANT!!! he is seeing a normal config segfault (by normal I mean php should sort their junk out). Michael then did what he constantly does and spins spins spins!
          What he is "reporting" here is a non-issue, it is not reportedly related to any possible ryzen issue


          Originally posted by debianxfce View Post

          You have motherboard from Asus with a fresh bios update, while Ryzen blamers idiots do have buggy msi motherboards and can not build stable systems. Michael posted a link to BSD patch, not a bug like he claims. Of course there is irq timing differences between cpus that should be fixed in the software drivers.
          Actually I don't, I have an MSI board, you can see that in the post you replied to. The problem is this site has piss-poor code/quote block formatting...


          Originally posted by debianxfce View Post
          Now these phoronix shit articles are referenced here too:
          https://hothardware.com/news/freebsd...esets-machines

          Make your own custom kernel instead of writing shit. Then if you have the problem make a bug report to kernel or compiler bugzillas.
          And this is the problem with this site and more specifically the one (Michael) that runs it. Sure report on things you see but you are then bound to add an amendment when new information comes to light.
          I posted quite quickly those segfaults appeared normal & then tracked down they were. We are now 12 pages in and still this shit is present. There should have been an update but noo... easier and more profitable to drive clicks...

          THis type of shit does not help narrow down EXACTLY where hte issue is... be is the silicon, be it AGESA, mobo, kernel, kernel config, compiler.. it just makes noise


          Originally posted by AdrianBc View Post

          No, your opinion is wrong.

          I have seen several opinions like this, in several forums, that maybe all those who report about this Ryzen bug are morons who do not know how to configure their computers.

          Not only is this opinion naive, but it is very likely that most, if not all, Ryzen processors sold at the beginning have this bug, but their owners are not aware of this fact because they did not test them for a long enough time.

          I have assembled and configured thousands of custom computers, since the days of Intel 8080 & Motorola 6800, until the latest Kaby Lake & Ryzen.
          I have also designed and debugged many embedded computers so there is no doubt that I know if the components that I have used for my Ryzen computer are of adequate quality and if the BIOS and operating system are OK.

          I have pre-ordered a Ryzen 7 1800X. I have used it with the best ASRock MB, with 32 GB DDR4-2400 ECC memory (ECC works OK in this MB), with a Noctua cooler that ensured low temperatures for the CPU and with a Titanium power supply with excellent noise and regulation and with excess capacity.

          I have applied all BIOS updates. The last one was about 2 weeks ago.

          I am a Gentoo user, so I performed a lot of compilations on the Ryzen system.


          Initially I believed that I am lucky, because this Ryzen seemed to work perfectly and its performances in everything I have tried were excellent.

          ...

          did I hit a nerve? good because now we can start dismissing the bullshit side of things. This is why I made that spreadsheet+form to track stuff like this
          https://docs.google.com/spreadsheets...#gid=950983791

          lets have a look shall we?
          Row54, Row56 stating stability issues BUT also a march=core2 if that is true no wonder stability issues. either way can be dismissed
          Row12 using GCC-5.x and -march=haswell. This has been shown to be unstable (https://wiki.gentoo.org/wiki/Talk:Ryzen)
          Row25 using GCC-5.x and -march=native. GCC-5 is not zen aware and this will cause issues
          Row35, haswell again
          Row50, native+gcc5 again
          Row52, native+gcc5 again

          I post in the main two gentoo threads and add support where possible. THere were at least two people who misbuilt binutils AND did not select the newert one so every single binary was linked using inappropriate opcode.
          There are at least 4 of us in #g-chat with Ryzen. One constantly builds the liveCD's & none of us have any issues...

          Could there be a silicon issue? sure but right now I am constantly seeing idiots not being able to build their system, hardware wise or OS wise.

          then we have king muppet who did a good 1st editorial but then went full retard and did not amend it helping to confuse the pot as to where any possible issues lie.

          This site and its maintainer is a joke





          Comment


          • Originally posted by Naib View Post
            BUT HE CANT!!! he is seeing a normal config segfault (by normal I mean php should sort their junk out).
            I think that cannot be repeated enough. People seem to miss that.
            So let's do it again:
            Originally posted by Naib View Post
            he is seeing a normal config segfault

            Comment


            • In supplement of my comment back on page 9, I ran kill-Ryzen test program last night. In the first half hour four threads had faults of some sort -- the first at 84 seconds, as I recall from wet-ware memory. From the message wording I assumed that these threads stopped running their tasks, although my CPU frequency tell-tale widgets showed that their cores stayed at 3.9 GHz and there wasn't a drop in CPU temperature. This was probably because the other thread of each core's pair was still running. I waited for the test program to end (expecting it to), but it didn't for a few hours before I stopped it with CNTL-Z in the terminal window. Other activities such as using other terminal windows ran normally during this time. My dmesg window more compactly listed the same four faults. That the program kept running suggests either an issue, or that it continually reloads each thread's compilation for another try. However, the test terminal window never updated with more information after reporting the four faults, nor did dmesg. I have no doubt that the cores were doing something to pull an additional 140W over idle.

              So the fault issue exists, but doesn't lock up my PC. killl-Ryzen was unsuccessful in killing the PC functionality, only successful in wounding a few thread processes. PC lockups that others have observed may be due to unstable timings, non-optimal voltages, or firmware deficiencies. The 9920 C6H BIOS I am running is the next to last released of at least 11 that I've downloaded and is at least the sixth I've tested. For Asus C6H motherboards, all BIOSes are beta BIOSes, whether official or not.

              I should also note, because it wasn't in my earlier message, that my GPU in the test PC is an Asus 1080 Ti OC (running default timing) and using proprietary nVidia driver 384.-something. The kernel is 4.10.0-22 (in Ubuntu kernel naming format) and likely has elements of kernels 4.11 and 4.12.
              Last edited by kaseki; 06 August 2017, 08:13 AM.

              Comment


              • Yes, kaseki, this problem does not usually lock up the computer. This has been observed but is not the usual behavior. I only had lockup in my machine once and I have been running kill-ryzen. on my machine dozens of time in the last two or three weeks and trying different configuration, BIOS options, BIOS versions, etc.

                If you want to see what was the reason for the BUILD to fail, you can go into mnt/ramdisk/workdir/buildloop.d/loop-XX/build.log and see the last lines (XX is the number of the loop that failed). For example, the last time I run it I got

                /mnt/ramdisk/workdir/gcc-7.1.0/mpfr/src/generic/mparam.h:1:0: internal compiler error: Segmentation fault

                Which is never logged in dmesg so it is not easy to recover.

                If build fails, then your processor is not 100% stable. It has the bug. I am sorry for that.

                Another important information. I am using phoronix-test-suit in my computer under Antergos for 12 hours now without the php test (only image-magic, kernel and apache). It did not showed any signs of problems up to now. Maybe this is related to the fact that the compilations are quite fast, the slowest completes in 84 seconds only. After that it has to stop, clean the directory and start a new test. So it does not pass all the time compiling. kill-ryzen.sh on the other hand runs 16 compilations in parallel (in my CPU) that takes more than 3 hours each to complete. That may make a difference to trigger the bug.

                (However the bug can also trigger in real world scenario, I became aware of this bug while compiling openblas and seeing it fail first with a segfault in the compiler. I thought that weird, dig some research and found the AMD technical forum that pointed me to the kill-ryzen.sh script).

                Obs: It is not completely clear that AMD does not acknowledge the problem. They wrote in the forum that they are investigating the issue. The main problem is that they do not say whether they can reproduce the bug internally. If they said that + "we are working on a fix" it would be a great start.

                Obs2: Anyone that says that people don't know how to build their systems: run kill-ryzen.sh for 24 - 48hs and then we talk.

                Comment


                • Originally posted by sdack View Post
                  And yet do you only keep looking at Ryzen when you should be looking at Intel, too,
                  why should I look at Intel? I've never owned an Intel CPU in my work computer, and never will. Doesn't interest me at all.


                  and figure out why it crashes there, too.
                  what "it" crashes there? What program/workload do you mean here? Be more specific.


                  Yet you act like any scared bitch and get hysterical about Ryzen.
                  yes, of course I'm very hysterical because the Ryzen is blazingly fast compared to my previous Phenom. Big bang for the buck, so to say, but that doesn't help if that bang often is a misfire.


                  You even call yourself RyzenNewbie
                  well, I could have called myself "sdackBuddy", but that wouldn't have made any difference at all.


                  to underline your cluelessness.
                  yep, at the moment I'm really clueless about the arbitrariness going on in my system. Just to repeat it for you: arbitrariness, capriciouness - damn, I really had to google that word.


                  Why do you get upset there when you know you're a newbie, I wonder?
                  well, after four months of wandering the "newbie" begins wanting to know what really is going on.


                  It's of course normal for two different CPUs to show slight differences in the way the bug exposes itself.
                  what bug exactly? Again, be more specific.


                  So why do you keep ranting and making this all about Ryzen when de facto you don't have a clue, RyzenNewbie? Or let me guess, you're still trying to figure that one out, too. *lol*
                  I like you, really; you motivate me to post more...
                  Last edited by RyzenNewbie; 06 August 2017, 08:27 AM. Reason: quoting fixed...

                  Comment


                  • Originally posted by Naib View Post
                    Could there be a silicon issue? sure but right now I am constantly seeing idiots not being able to build their system, hardware wise or OS wise.
                    Come on dude - what do you expect the average-joe to be doing in this case? Nobody of them understands the issue even remotely to build a halfway educated opinion and the ones who do are flipping information left and right. People using Linux can't be branded "idiots", people driving drunk can. So maybe we see an issue with the testing methodology here, but it hardly qualifies being an idiot for a slight oversight of including PHP.

                    Testing is good and even if it turns out that Ryzen is innocent and there is no hardware but a software bug, we learned something from it. (and can then throw the torches in the right direction)

                    In any case, the discussion looks more like a witch-hunt right now, which might not be the best platform doing objective analysis.

                    Comment


                    • Originally posted by leipero View Post
                      For me things are very simple, and I still believe those crashes are caused by software, if it's fine on Windows, there's 0 reasons to believe there's anything wrong with hardware, that simple.
                      wrong.

                      Have you ever thought about the following: "Hey Mr. Nadella, there are a few oddities in our CPU that could lead to undesired behaviour, but you can circumvent them by applying following workarounds in the Windows kernel: [funny list of assembler commands] BUT, Mr. Nadella, don't ever tell anyone, okay?!"
                      Last edited by RyzenNewbie; 06 August 2017, 08:33 AM. Reason: fixed typo...

                      Comment

                      Working...
                      X