Announcement

Collapse
No announcement yet.

AGESA 1.0.0.6b Might Fix The Ryzen Linux Performance Marginality Problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by Brisse View Post
    Just updated BIOS for my PRIME X370-PRO this morning. According to release notes, it has the new agesa.

    "PRIME X370-PRO BIOS 0902
    Update AGESA 1.0.0.6B"

    Went ahead and ran some kill-ryzen, but it didn't take long...

    [loop-5] TIME TO FAIL: 53 s

    [loop-6] TIME TO FAIL: 119 s

    [loop-11] TIME TO FAIL: 132 s

    So no, it's definitely not fixed.
    how long does it take to fail if you reduce memory frequency to lowest possible setting?

    Comment


    • #72
      I doubt it's really fixed at all. Obviously they tweaked something with the update that sort of mitigates the issue, but the problem is still there. AMD probably has no idea what it is and I doubt a real fix will be forthcoming. Most likely it will be left in a "good enough" state and any real fix (if any) will be implemented in Ryzen's second gen.

      Comment


      • #73
        Originally posted by Melcar View Post
        I doubt it's really fixed at all. Obviously they tweaked something with the update that sort of mitigates the issue, but the problem is still there. AMD probably has no idea what it is and I doubt a real fix will be forthcoming. Most likely it will be left in a "good enough" state and any real fix (if any) will be implemented in Ryzen's second gen.
        This seems unfairly negative. Yes all CPUs have bugs and completely new chips have more bugs (Zen is unrelated to any previous CPU). AMD has been pretty responsive. So far the threadripper and epyc chips don't have the problem, despite having the same core. AMD is replacing CPUs with the problem, and the most consistent reports I've seen claim that the replacement chips work. Keep in mind that many billions of transistors running a billions of instructions per second do lead to some complex problems. Even a single bit that off one in a trillion times might well come up fairly often. If it's an average of 20 minutes per error, only when 8 cores (16 threads) are flat out busy that's an astounding 56 Trillion instructions.

        Seems like a normal race condition to me, many things (like turning off threading) helps, but does not completely solve the problem. Even once exactly identified it might only been a fraction of the ryzen sold, and only when in a particular environment of voltages, temperatures, or clock speeds. Additionally the fix for a single case might not actually be a fix for all cases.

        Of course AMD wants all the facts before they make an announcement that might be rather expensive. Is it worth taking a loss on every ryzen shipped if it only impacts 1% of the users 1% of the time? Intel's had several similar problems, the most publicized was the FDIV bug from many years ago, and they took months before they offered an RMA and AMD was quicker offering an RMA.

        So it looks like business as usual, ryzen is a good chip, it had a bug, and AMD is dealing with it. If you really care get an RMA, if not I'd just wait and see how it evolves. Personally I wanted to wait to see how things settled out with performance, stability, and motherboard options. The R7-1700 looks like a pretty nice chip for the money, but I'll wait till the normal Amazon/Newegg pipeline has the fixed chips for sure.

        AMD really raised the bar this time, much more so then most would have guessed. Sure every company rants about the next generation, but AMD delivered a big increase in performance, performance per clock, good clocks, good motherboards, and most importantly good volume on good chips for a good price. That I can buy a chip today like the $300 R7-1700 that takes 65 watts or so, has 8 cores, 16 threads, and fits in a reasonably priced motherboard would have been a pretty incredible thing before Ryzen shipped.
        Last edited by BillBroadley; 20 September 2017, 11:02 PM. Reason: spelling

        Comment


        • #74
          I was thinking about getting a Ryzen system for some light virtualization (home lab + gpu passthrough for gaming) but then stumbled across this report. Have most of the faulty chips cleared out of the supply chain yet? If I'm not frequently compiling software should I even be concerned about this?

          Comment


          • #75
            I've purchased two Ryzen 7 1700 systems in the first month they were available. One for use at work, one for use at home. Both are overclocked to 3.6 Ghz running Ubuntu 16.04 host OS. Both run non-overclocked RAM at 2400 Mhz. Both run the Asus X370 Prime motherboard, running the recently installed the 0902 BIOS which includes AGESA 1.0.0.6B microcode . Both systems run with at least 5 virtual machines that are under constant load, including processing motion detection on 4 cameras inputting 4 simultaneous 4 megapixel video feeds. I've run major system upgrades and compiles simultaneously with all the VMS and in the last 5 months of use I've had absolutely no problems or instability. Since I was a very early buyer of the system, I did find problems with the earliest BIOS revs, but those were resolved in the first month or so with BIOS updates. I have been incredibly happy with my Ryzen systems and they have been perfectly reliably for 5 months under heavy 24/7 use. Both systems use M.2 Samsung EVO 960 1 TB SSDs that I've tested on my system and seen 150,000 IOPS in disk performance. I've never in my life seen anything that comes close to these systems in multitasking performance. I can do disk intensive work loads all VMs simultaneously and it doesn't slow down in the slightest. This system can Install ubuntu 16.04 on a new VM in about 2 minutes, while the system is running a heavy load on other VMs. This is really Incredible stuff.

            I've read about this Ryzen bug, so tonight I downloaded the kill-ryzen script, which on my 1700 runs 16 simultaneous gcc builds. Even with AGESA 1.0.0.6B , I did see segfaults after a couple minutes of run time. So if you are planning on running massively parallel gcc compiles, for sure you should make sure you get one of the newer chips. Of course I would rather have a chip that didn't have this problem, but I think it's really an academic issue, not a practical problem for myself and most people. At the moment, I'm not planning on RMAing my 1700s - at home and work they are running a lot of critical systems that I can't just have not working for days waiting for a new chip to fix problems that are hypothetical to me. So I wouldn't be TOO concerned about it.

            Comment


            • #76
              JoeV, thanks for the very informative feedback. One final question; does this just affect separate instances of gcc, or does it also apply if you run a single instance of gcc with the -j option (e.g., gcc -j 12)?

              EDIT: Tried pricing out a new mobo+Ryzen+RAM, and got sticker shock from the RAM. DDR4 seems to be going for around $10/GB. I'll want at least 24GB, so that's an additional $250 for a Ryzen build. I'm used to paying $2-$3/GB for used DDR3 ECC RAM. I'll probably have to shelve this idea for a few months.
              Last edited by imrazor; 24 September 2017, 04:44 AM.

              Comment


              • #77
                BillBroadley
                Hello,

                I ran that gcc from source and the final screen is just like yours.
                Image 20170925 130743 2016x1134 hosted in ImgBB

                I also ran kill-ryzen script several times but each times fail right after 20 minutes. I think it's running out if memory not segfault but how others are running it for one day ? I have 32gb of ram. Also it fails same after 20 minutes on my 7700hq laptop.
                Once in 6-7 runs I got another error amd-vi something. I have attached screen shot.
                Image 20170924 015044 2016x1134 hosted in ImgBB

                Image 20170924 043441 2016x1134 hosted in ImgBB

                Also I ran another test to build mame but it's freezing on both ryzen and 7700hq laptop.
                What other test are reliable to test segfaults? Is that gcc build enough to see that is segfault free since i didnt hadany error?
                What is that amd vi error?

                Thank you

                Comment


                • #78
                  BillBroadley
                  This is the build.log from kill-ryzen script when is failing.
                  Image 20170925 193801 2016x1134 hosted in ImgBB

                  Still, how others are running this script for hours ?

                  Comment

                  Working...
                  X