Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by kaseki View Post
    [to Silverthorn]
    I can't run two machines at once with the same environment, so I'll report on an older system running Mint 17.3 MATE 64-bit that is performing a four-thread compilation as I type. The CPU is a Phenom II X4 965 running Linux kernel 3.13.0-105. It is not overclocked (as far as I remember). Board is Gigabyte 770T-USB3. Randomize is shown as 2. So far after 2 hours of operation, there have been no messages after the "loop n [date and time] start zero" messages.
    please run that test with your Phenom on an upgraded Mint 18.1 with kernel 4.11.0-13. And, please, run that test with the same thread count as on the Ryzen; plus longer than 2 hours. Just to ensure a comparable environment.


    I would hate to have to evaluate dozens of quasi-correlated timing, voltage, and transmission line loading resistance settings with this test added.
    agreed - that shouldn't be neccessary to get a stock system running stably, should it - meaning: just select "BIOS defaults" and everything is fine...

    Comment


    • Originally posted by RyzenNewbie View Post
      wrong.

      Have you ever thought about the following: "Hey Mr. Nadella, there are a few oddities in our CPU that could lead to undesired behaviour, but you can circumvent them by applying following workarounds in the Windows kernel: [funny list of assembler commands] BUT, Mr. Nadella, don't ever tell anyone, okay?!"
      Are you talking about adding a guard page at the top of canonical userspace (the workaround Matt Dillon mentioned)?

      Linux has had that for years, and I imagine Windows has as well:

      http://elixir.free-electrons.com/lin...sm/processor.h

      If BSD does not already have a guard page then I strongly recommend you add one because there are at least three older CPU families (two Intel and one AMD IIRC) which can exhibit unexpected behaviour when executing code in the top page of user space.

      EDIT - looks like a guard page was just added to FreeBSD:

      https://svnweb.freebsd.org/base?view...evision=321899

      Just remembered that there was already a small (less than a page) guard region in BSDs but AFAIK other OSes went with a full page from the start.
      Last edited by bridgman; 06 August 2017, 11:25 PM.
      Test signature

      Comment


      • Implement specific usage of verify_pre_usermode_state for user-mode
        returns for x86.

        Comment


        • Originally posted by RyzenNewbie View Post

          please run that test with your Phenom on an upgraded Mint 18.1 with kernel 4.11.0-13. And, please, run that test with the same thread count as on the Ryzen; plus longer than 2 hours. Just to ensure a comparable environment.

          agreed - that shouldn't be neccessary to get a stock system running stably, should it - meaning: just select "BIOS defaults" and everything is fine...
          First, I just now had to kill the processes on the Phenom (this PC) after 3 hours (no seg. faults) because it slowly locked up my mouse when I tried to do other stuff in parallel. (Even the REISUB technique only broke the spell; it didn't by itself initiate a reboot.) The Phenom PC is my present "production" computer and I cannot divert it to an experiment beyond what I just tried at this time. I'm not even ready to upgrade it to Mint 18 and deal with any possible incompatibility consequences at the moment. Further, it is only a four-thread four-core machine. The kill-Ryzen program sets the number of threads to the number the CPU can support, so it can't do what you want of forcing 16 threads on the Phenom. Beyond that, the storage is different, the RAM is different, the MB and BIOS is different; in no real way are the two PCs comparable except being AMD based. La-de-dah. I'm not even sure I would call the AMD of then and the AMD of now comparable.

          And LOL on stable. The Ryzen/Asus C6H OC beta BIOS boondoggle has a 25k+ message thread for a reason. While my initial hardware build of my Ryzen HTPC POSTed without problem on the shipped BIOS (one version past the bricking BIOS), and the Linux build was possible albeit with some GPU driver difficulties, attempts to get one's money's worth out of Ryzen and the DRAM required a lot of work by many people. (But I think you really already know this.)

          Comment


          • Originally posted by aufkrawall View Post
            All that talk about temperatures is quite ridiculous, normal compiling is not a thermal stresstest. Actually, it's totally harmless compared to Prime95 or LinX (even without Intel's heating AVX).
            Well make -j16 compilation does stress the CPU quite a bit, but it's nothing compared to mprime. My Ryzen goes up to 58°C (according to mobo sensor) when compiling for few hours. This guy is having temps such as 92.5°C. Sounds like a major problem with the heat sink installation

            Comment


            • Michael, please, urgently change the title of your article. 50+ segfaults per hour is the rate of conftest segfaults and that shouldn't count.

              Also, we know you want to advertise the use of the stress run functionality on PTS, but it has failed and is much worse that simply running the "ryzen kill" script for the purpose of reproducing the Ryzen bug. If you insist, then please remove the php test from the suggested command as that is the one with conftest segfaults which leads to false positives.

              Please Michael, act quickly, your article as it stands is only adding noise to the Ryzen issue.

              Comment


              • Originally posted by shmerl View Post

                Note, that Ryzen X models report CPU temperatures 20°C higher than real ones.

                See https://community.amd.com/community/...mmunity-update

                Still, +72.2°C is pretty high for these chips. You probably need a better cooler.
                The 90°C+ is probably junction temperature (min. 45, max I've seen is 94). All are taken from the superio chip: it has another reading labelled "CPUTIN" which never goes beyond 66°C. BUT, my main point was, that after improving cooling (i.e. Tj not immediately jumping to >92C on tiniest bits of load), the segfaults went away.

                Comment


                • Originally posted by debianxfce View Post
                  Heil intel...
                  No way.
                  It's: Hail Mary!

                  And I prefer AMD. Still I've got no idea what your post has to do with my original one.
                  Phoronix: 50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen In direct continuation of yesterday's article about easily causing segmentation faults

                  Comment


                  • Originally posted by ermo View Post

                    Possibly stating the obvious here, but if the RMA didn't cover motherboard/RAM/etc., how can you conclude that the CPU might be from a bad batch?

                    A little more context for your thoughts on this might be in order?
                    There have been many independent reports on various mainboards with different chipsets, so I'd consider having the mainboard as a cause of the issue as very unlikely. The RAM cannot be ruled out yet, as there is little data on which memory brands were being used (I'll check the other two Ryzen machines where I could reproduce segfaults during mesa compilation, if they have by chance the same RAM as my own machine, or vastly different memory).
                    The fact that for some people the processor sent back by RMA does not show any issues on otherwise identical hardware is another hint in the direction that there might be working/non-working Ryzens out there.

                    Comment


                    • Originally posted by caligula View Post

                      Well make -j16 compilation does stress the CPU quite a bit, but it's nothing compared to mprime. My Ryzen goes up to 58°C (according to mobo sensor) when compiling for few hours. This guy is having temps such as 92.5°C. Sounds like a major problem with the heat sink installation
                      I see no temp difference between -j16 compilation and prime on ryzen, Intel however is a different story
                      Don't know if my cooling solution is the reason why there is no temp difference.

                      Comment

                      Working...
                      X