Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by mlau View Post

    The 90°C+ is probably junction temperature (min. 45, max I've seen is 94). All are taken from the superio chip: it has another reading labelled "CPUTIN" which never goes beyond 66°C. BUT, my main point was, that after improving cooling (i.e. Tj not immediately jumping to >92C on tiniest bits of load), the segfaults went away.

    CPUTIN is motherboard sensors, so it's not as precise as something like SMBUSMASTER which usually describes the CPU in Nuvoton chips. See https://github.com/groeck/nct6775/issues/49

    I'll try to reapply the thermal paste, and see if that will improve mce / segfaults situation.

    Comment


    • Originally posted by k1e0x View Post
      finally, finally, finally a complete poudriere run without any system freezes or unexpected reboots. :-) Take a look for yourself: -- 26020 packages built within 43 hours; as flawky as the Ryzen may be, but that thing is really fast (still running stock with slow RAM) - comparing to: -- I don't know the specs of the FreeBSD "beefy" servers; but I suggest that they'll try a Threadripper when they will be out. Don, I'm quite certain that you nailed the bug for what I've created that report here for. Thank you very, very much. And if it's really nailed and not a magic moment, then Matthew Dillon was right - all the time. He said that he sent a full test case to AMD in April. It's quite a shame that there is still no reaction from their side. I'm sure, Matthew would have told by now if they really did. Anyways, I'll start a poudriere run again to see how many of the failed ports can be built then. Don, is it possible to get your one-liner patch upstream, so that other FreeBSD Ryzen users may profitate from it?

      So can Linux kernel developers do anything similar to address these bugs?

      Comment


      • Originally posted by shmerl View Post


        So can Linux kernel developers do anything similar to address these bugs?
        They aren't sure they nailed it yet. They have a patch that re-orders the placement of certain areas of ram if it detects its running on a ryzen system. (CPU specific hacks like this are not unheard of for both AMD and Intel) It appears to help but a final patch will be more involved because that affects other things. Not sure, I'm not a coder, just a sysadmin. FreeBSD doesn't seem to be segfaulting as often as Linux (I've only seen this once, and Mesa usually builds fine) but when it does it seems to cause an invariance in the system state and it panics. Linux just ignores it and drives on like nothing happened as its not as paranoid about the state of the system being polluted I guess.

        So.. yes.. maybe? but its a hack the problem may be with AMD or possibly they way those programs are trying to be compiled.. dunno. It's a fairly obscure bug so I think Ryzen is probably fine, no sense saying "oooh ryzen is busted garbage bla bla bla" If it ends up being AMD's problem either what will prob happen is AMD will be able to fix it in Ryzen microcode updates or all OS's will need some sort of small tweak for it. (many of these exist now for various CPU's. If you remember I think it was the Pentium 4 that couldn't do math. Kinda important for a CPU lol, this isn't as big of a deal as returning the wrong results.)
        Last edited by k1e0x; 07 August 2017, 11:31 AM.

        Comment


        • Originally posted by chithanh View Post
          Offically, ASRock and Biostar support ECC memory in ECC mode on their AM4 mobos. ...
          Where can I find these official links?
          I couldn't find it neither on their websites nor in the manuals. At least for the ASRock Fatal1ty X370 Gaming K4. The manual of BIOSTAR X370GT7 Ver. 5.x (don't like the BIOSTAR as it doesn't have intel LAN) states "supports non-ECC 4/ 8/ 16 GB DDR4 module".

          Comment


          • Originally posted by RyzenNewbie View Post
            For anyone interested testing the - in the FreeBSD's reports mentioned - programs/scripts, I've created and uploaded a USB image to:
            I apologize, folks; with that "ryzen_stress_test.sh" buildworld/buildkernel script, I made a mistake - the default size of "/tmp" with a read-only USB drive is too small, so every build will fail due to lack of free space.

            In order to fix that, please execute:
            Code:
            umount /tmp
            mount -t tmpfs a /tmp
            right after booting from USB drive and right before you execute that script.

            Sorry for that hassle; I'll upload an updated image tomorrow...

            Comment


            • Originally posted by bridgman View Post
              Are you talking about adding a guard page at the top of canonical userspace (the workaround Matt Dillon mentioned)?

              Linux has had that for years, and I imagine Windows has as well:

              http://elixir.free-electrons.com/lin...sm/processor.h

              If BSD does not already have a guard page then I strongly recommend you add one because there are at least three older CPU families (two Intel and one AMD IIRC) which can exhibit unexpected behaviour when executing code in the top page of user space.

              EDIT - looks like a guard page was just added to FreeBSD:

              https://svnweb.freebsd.org/base?view...evision=321899

              Just remembered that there was already a small (less than a page) guard region in BSDs but AFAIK other OSes went with a full page from the start.
              I have no idea whether a guard page is already included or not - I'm currently trying to clarify that with the FreeBSD devs and in how far that could be relevant.

              But my question still stands: why do I need that now and not with all the previous CPUs (including Phenom)?

              Comment


              • Originally posted by shmerl View Post


                CPUTIN is motherboard sensors, so it's not as precise as something like SMBUSMASTER which usually describes the CPU in Nuvoton chips. See https://github.com/groeck/nct6775/issues/49

                I'll try to reapply the thermal paste, and see if that will improve mce / segfaults situation.


                Good luck if you have the same little baby CPU cooler which me (mount and unmount isn't a pleasure)



                Baby on latest FX 8350

                Last edited by scorpio810; 07 August 2017, 01:26 PM.

                Comment


                • Originally posted by RyzenNewbie View Post
                  kern.hz=1000 sounds interesting; I'll try that tomorrow at work.
                  okay, I've defaulted my BIOS (except: SATA hot plug -> enabled, SVM -> enabled, state after power-loss -> power on), set "kern.hz=1000" and started a fresh poudriere run - let's see how it goes...

                  Comment


                  • Originally posted by scorpio810 View Post

                    Good luck if you have the same little baby CPU fan which me (mount and unmount isn't a pleasure)
                    Mine is just a drop smaller (Noctua NH D15 SE-AM4):


                    It seems to be easier to put on though, since it doesn't have that flat panel on top that blocks access to screws.
                    Last edited by shmerl; 07 August 2017, 12:42 PM.

                    Comment


                    • Originally posted by RyzenNewbie View Post
                      I have no idea whether a guard page is already included or not - I'm currently trying to clarify that with the FreeBSD devs and in how far that could be relevant. But my question still stands: why do I need that now and not with all the previous CPUs (including Phenom)?
                      I'm new to this myself (I work on the GPU SW side) but AFAICS there are at least three different CPU families (1 from AMD) over the last decade which required special treatment, basically making sure that no code gets executed near the end of canonical user address space. The top of user process address space is the dividing line between the least privileged code and the touch-it-and-die non-canonical address space.

                      Over time it seems that more "safe area" is required - presumably because each new CPU generation pre-fetches further ahead than the last one. In a sense Linux (and Windows I believe) got lucky by reserving a full guard page while BSD allocated a smaller guard area. As a result BSD has had to bump the guard area (to a full page) while other OSes did not.

                      That's my impression anyways.
                      Last edited by bridgman; 07 August 2017, 02:02 PM.
                      Test signature

                      Comment

                      Working...
                      X