Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Naib View Post
    For those doing builds? what -O is being used? -Os, -O, -O2, -O3

    one of the differences between AMD and Intel is shared cache and independent cache. IIRC there were gentoo users, using older AMD chips where -O3 was being used and as such aspects could not fit into the cache resulting in a segfault.

    my make.conf is very conservative but I have been using -O2 for years, even on my old system, an i7
    For FreeBSD you don't need to set anything in make.conf to get generic safe builds.
    For Gentoo set:
    CFLAGS="-O2 -march=x86-64 -mtune=generic -pipe"
    CHOST="x86_64-pc-linux-gnu"

    thats prob your best bet. znver1 and core2 and all that other crap is wrong at the moment, it either doesn't work or it a stub or its incorrect. it takes some time for the compilers to get updated.. I think LLVM 5 is the first that has any real Zen optimisations in it. Then.. umm.. I think its /etc/portage/env/?? to set makeopts for single ebuilds. You'll want to set that for.. prob GCC, Mesa and anything else that busting on you. set -j1 for them (you may be able to get away with -j9 not sure..) You can use -j17 for everything else (on R7)
    Last edited by k1e0x; 07 August 2017, 10:34 AM.

    Comment


    • Originally posted by mlau View Post

      The 90°C+ is probably junction temperature (min. 45, max I've seen is 94). All are taken from the superio chip: it has another reading labelled "CPUTIN" which never goes beyond 66°C. BUT, my main point was, that after improving cooling (i.e. Tj not immediately jumping to >92C on tiniest bits of load), the segfaults went away.

      CPUTIN is motherboard sensors, so it's not as precise as something like SMBUSMASTER which usually describes the CPU in Nuvoton chips. See https://github.com/groeck/nct6775/issues/49

      I'll try to reapply the thermal paste, and see if that will improve mce / segfaults situation.

      Comment


      • Originally posted by k1e0x View Post
        finally, finally, finally a complete poudriere run without any system freezes or unexpected reboots. :-) Take a look for yourself: -- 26020 packages built within 43 hours; as flawky as the Ryzen may be, but that thing is really fast (still running stock with slow RAM) - comparing to: -- I don't know the specs of the FreeBSD "beefy" servers; but I suggest that they'll try a Threadripper when they will be out. Don, I'm quite certain that you nailed the bug for what I've created that report here for. Thank you very, very much. And if it's really nailed and not a magic moment, then Matthew Dillon was right - all the time. He said that he sent a full test case to AMD in April. It's quite a shame that there is still no reaction from their side. I'm sure, Matthew would have told by now if they really did. Anyways, I'll start a poudriere run again to see how many of the failed ports can be built then. Don, is it possible to get your one-liner patch upstream, so that other FreeBSD Ryzen users may profitate from it?

        So can Linux kernel developers do anything similar to address these bugs?

        Comment


        • Originally posted by shmerl View Post


          So can Linux kernel developers do anything similar to address these bugs?
          They aren't sure they nailed it yet. They have a patch that re-orders the placement of certain areas of ram if it detects its running on a ryzen system. (CPU specific hacks like this are not unheard of for both AMD and Intel) It appears to help but a final patch will be more involved because that affects other things. Not sure, I'm not a coder, just a sysadmin. FreeBSD doesn't seem to be segfaulting as often as Linux (I've only seen this once, and Mesa usually builds fine) but when it does it seems to cause an invariance in the system state and it panics. Linux just ignores it and drives on like nothing happened as its not as paranoid about the state of the system being polluted I guess.

          So.. yes.. maybe? but its a hack the problem may be with AMD or possibly they way those programs are trying to be compiled.. dunno. It's a fairly obscure bug so I think Ryzen is probably fine, no sense saying "oooh ryzen is busted garbage bla bla bla" If it ends up being AMD's problem either what will prob happen is AMD will be able to fix it in Ryzen microcode updates or all OS's will need some sort of small tweak for it. (many of these exist now for various CPU's. If you remember I think it was the Pentium 4 that couldn't do math. Kinda important for a CPU lol, this isn't as big of a deal as returning the wrong results.)
          Last edited by k1e0x; 07 August 2017, 11:31 AM.

          Comment


          • Originally posted by chithanh View Post
            Offically, ASRock and Biostar support ECC memory in ECC mode on their AM4 mobos. ...
            Where can I find these official links?
            I couldn't find it neither on their websites nor in the manuals. At least for the ASRock Fatal1ty X370 Gaming K4. The manual of BIOSTAR X370GT7 Ver. 5.x (don't like the BIOSTAR as it doesn't have intel LAN) states "supports non-ECC 4/ 8/ 16 GB DDR4 module".

            Comment


            • Originally posted by RyzenNewbie View Post
              For anyone interested testing the - in the FreeBSD's reports mentioned - programs/scripts, I've created and uploaded a USB image to:
              I apologize, folks; with that "ryzen_stress_test.sh" buildworld/buildkernel script, I made a mistake - the default size of "/tmp" with a read-only USB drive is too small, so every build will fail due to lack of free space.

              In order to fix that, please execute:
              Code:
              umount /tmp
              mount -t tmpfs a /tmp
              right after booting from USB drive and right before you execute that script.

              Sorry for that hassle; I'll upload an updated image tomorrow...

              Comment


              • Originally posted by bridgman View Post
                Are you talking about adding a guard page at the top of canonical userspace (the workaround Matt Dillon mentioned)?

                Linux has had that for years, and I imagine Windows has as well:

                http://elixir.free-electrons.com/lin...sm/processor.h

                If BSD does not already have a guard page then I strongly recommend you add one because there are at least three older CPU families (two Intel and one AMD IIRC) which can exhibit unexpected behaviour when executing code in the top page of user space.

                EDIT - looks like a guard page was just added to FreeBSD:

                https://svnweb.freebsd.org/base?view...evision=321899

                Just remembered that there was already a small (less than a page) guard region in BSDs but AFAIK other OSes went with a full page from the start.
                I have no idea whether a guard page is already included or not - I'm currently trying to clarify that with the FreeBSD devs and in how far that could be relevant.

                But my question still stands: why do I need that now and not with all the previous CPUs (including Phenom)?

                Comment


                • Originally posted by shmerl View Post


                  CPUTIN is motherboard sensors, so it's not as precise as something like SMBUSMASTER which usually describes the CPU in Nuvoton chips. See https://github.com/groeck/nct6775/issues/49

                  I'll try to reapply the thermal paste, and see if that will improve mce / segfaults situation.
                  https://youtu.be/gIx0JFEyLuQ

                  Good luck if you have the same little baby CPU cooler which me (mount and unmount isn't a pleasure)



                  Baby on latest FX 8350

                  https://download.tuxfamily.org/qet/joshua/workstation/
                  Last edited by scorpio810; 07 August 2017, 01:26 PM.

                  Comment


                  • Originally posted by RyzenNewbie View Post
                    kern.hz=1000 sounds interesting; I'll try that tomorrow at work.
                    okay, I've defaulted my BIOS (except: SATA hot plug -> enabled, SVM -> enabled, state after power-loss -> power on), set "kern.hz=1000" and started a fresh poudriere run - let's see how it goes...

                    Comment


                    • Originally posted by scorpio810 View Post

                      Good luck if you have the same little baby CPU fan which me (mount and unmount isn't a pleasure)
                      Mine is just a drop smaller (Noctua NH D15 SE-AM4):


                      It seems to be easier to put on though, since it doesn't have that flat panel on top that blocks access to screws.
                      Last edited by shmerl; 07 August 2017, 12:42 PM.

                      Comment

                      Working...
                      X