Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • There are some discussions on redit:

    https://www.reddit.com/r/programming...ausing_random/

    There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

    I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.

    Comment


    • Originally posted by pjssilva View Post
      There are some discussions on redit:

      https://www.reddit.com/r/programming...ausing_random/

      There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

      I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.
      I am on it, started it now.
      Asus B350M-A
      1700 @ 3.8ghz.
      Corsair LPX 2666 32 gb (2x16gb)

      4.11.11-041111-generic
      in ubuntu.

      Comment


      • Quick!
        I'd edit my own post but I haven't made enough of em it seems :-)

        2017 x86_64 x86_64 x86_64 GNU/Linux
        cat /proc/sys/kernel/randomize_va_space
        2
        Using 16 parallel processes
        [KERN] -- Logs begin at on. 2017-08-02 00:48:25 CEST. --
        [KERN] aug. 02 00:50:41 oleUbuntu kernel: userif-3: sent link up event.
        [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link down event.
        [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link up event.
        [KERN] aug. 02 00:50:52 oleUbuntu kernel: zram: Cannot change disksize for initialized device
        [KERN] aug. 02 00:52:49 oleUbuntu kernel: zram: Cannot change disksize for initialized device
        [KERN] aug. 02 00:53:27 oleUbuntu kernel: zram: Cannot change disksize for initialized device
        [KERN] aug. 02 00:55:03 oleUbuntu kernel: zram0: detected capacity change from 68719476736 to 0
        [KERN] aug. 02 00:56:08 oleUbuntu kernel: zram0: detected capacity change from 0 to 68719476736
        [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
        [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounted filesystem without journal. Opts: discard
        [loop-0] on. 02. aug. 00:57:02 +0200 2017 start 0
        [loop-1] on. 02. aug. 00:57:03 +0200 2017 start 0
        [loop-2] on. 02. aug. 00:57:04 +0200 2017 start 0
        [loop-3] on. 02. aug. 00:57:05 +0200 2017 start 0
        [loop-4] on. 02. aug. 00:57:06 +0200 2017 start 0
        [loop-5] on. 02. aug. 00:57:07 +0200 2017 start 0
        [loop-6] on. 02. aug. 00:57:08 +0200 2017 start 0
        [loop-7] on. 02. aug. 00:57:09 +0200 2017 start 0
        [loop-8] on. 02. aug. 00:57:10 +0200 2017 start 0
        [loop-9] on. 02. aug. 00:57:11 +0200 2017 start 0
        [loop-10] on. 02. aug. 00:57:12 +0200 2017 start 0
        [loop-11] on. 02. aug. 00:57:13 +0200 2017 start 0
        [loop-12] on. 02. aug. 00:57:14 +0200 2017 start 0
        [loop-13] on. 02. aug. 00:57:15 +0200 2017 start 0
        [loop-14] on. 02. aug. 00:57:16 +0200 2017 start 0
        [loop-15] on. 02. aug. 00:57:17 +0200 2017 start 0
        [loop-2] on. 02. aug. 00:57:49 +0200 2017 build failed
        [loop-2] TIME TO FAIL: 47 s
        [loop-13] on. 02. aug. 00:58:00 +0200 2017 build failed
        [loop-13] TIME TO FAIL: 58 s
        [KERN] aug. 02 00:58:00 oleUbuntu kernel: bash[23093]: segfault at 7fff2c45f69c ip 00007fff2c45f69c sp 00007fff2c45f4f8 error 15

        Comment


        • Originally posted by scorpio810 View Post

          Thank you for the tip. ;-)
          Just tried and run fine now without segfault when I build my cross environment
          "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5" on my Debian Sid.
          Before I saw a lot of "segfault at 10 ip 0000000000000010 sp 00007ffcdbc8df58 error 14 in cc1plus"
          That very very strange or black magic !

          I can rebuild entirely my cross environment with "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5 " without crash and only a little warning in log :
          Code:
          perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
          But sometimes it crash again like bash but finish the job and build fine after...
          Code:
          [ 2621.739360] bash[10208]: segfault at 8 ip 00000000004321ac sp 00007fffffffbb20 error 6 in bash[400000+100000]
          or in cc1plus segfault crash again and again and need finish compile by make --jobs=8 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5
          Code:
          [ 6326.849438] cc1plus[16495]: segfault at 10 ip 0000000000000010 sp 00007fffffffc2d8 error 14 in cc1plus[100000000+1606000]
          [ 6330.352677] cc1plus[16441]: segfault at 10 ip 0000000000000010 sp 00007fffffffc598 error 14 in cc1plus[100000000+1606000]
          [21533.790423] cc1plus[20285]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
          [21640.952023] cc1plus[23381]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
          norandmaps added to grub can't help now .. on my custom kernel 4.12.3, and 4.12.4 (kernel.org)
          Debian unstable on 1700X (made in Malaysia ... ) Dark Rock pro 3, MSI b350 tomahawk BIOS 1.71 beta (AGESA 1.0.0.6a) XMP2 profile but 2T command rate and RAM to1.35V
          Core C6, cool and Quiet, and core boost disabled, Vcore set to 1.27V, Vsoc set to 1.10V

          I saw a lot cc1plus segfault with kernel 4.12.4 with the same config file...with kernel 4.12.3

          EDIT: other thing today :


          Code:
          sudo dmesg | tail
          
          [ 5948.611360] [ 9748]  1000  9748    10452     4240      22       3        0             0 moc
          [ 5948.611361] [ 9749]  1000  9749     3945     1090      11       3        0             0 i686-w64-mingw3
          [ 5948.611363] [ 9751]  1000  9751     5662      890      16       3        0             0 moc
          [ 5948.611364] [ 9754]  1000  9754     2585       59       9       3        0             0 i686-w64-mingw3
          [ 5948.611365] [ 9755]  1000  9755    10247     1496      23       3        0             0 cc1plus
          [ 5948.611366] [ 9756]  1000  9756     3945     1090      12       3        0             0 i686-w64-mingw3
          [ 5948.611367] Out of memory: Kill process 8865 (cc1plus) score 25 or sacrifice child
          [ 5948.611372] Killed process 8865 (cc1plus) total-vm:474364kB, anon-rss:412300kB, file-rss:0kB, shmem-rss:0kB
          [ 5951.892679] cc1plus[9010]: segfault at 10 ip 0000000000000010 sp 00007fffffffcb18 error 14 in cc1plus[100000000+15b7000]
          [ 5975.224183] cc1plus[10188]: segfault at 10 ip 0000000000000010 sp 00007fffffffca98 error 14 in cc1plus[100000000+15b7000]
          
          
          [  540.648393] traps: ld[7488] general protection ip:7f815e02bc72 sp:7fffffffdd00 error:0 in libbfd-2.28-system.so[7f815dfa6000+129000]
          [ 2030.912805] cc1plus[12796]: segfault at 10 ip 0000000000000010 sp 00007fffffffc618 error 14 in cc1plus[100000000+1606000]
          scorpio810
          Junior Member
          Last edited by scorpio810; 04 August 2017, 07:49 AM.

          Comment


          • oleyska
            Senior Member
            oleyska Thanks for the report.
            scorpio810
            Junior Member
            scorpio810 I got a little confused by your post, did you try the kill_rizen.sh test I suggest. It is very reliable to spot systems with problems. Just let it run for some hours.

            I have also come across an interesting post on Gentoo Wiki. If you go to their Ryzen page there is a troubleshooting section that comments about this compilation problem (https://wiki.gentoo.org/wiki/Ryzen#Troubleshooting). There you can find a link for a datasheet of the result of a questionnaire answered by more than 60 Gentoo users about Ryzen. From what I can see more than 50% report problems in stability (there is a column for that). I think that is huge!

            I would be very nice if
            phoronix
            Administrator
            phoronix to try the test in his systems and report. We need to get serious attention on this and I believe that a Phoronix article is possibly the best way.

            Comment


            • @pjssilva I can't ! kill_rizen.sh need > 16 GB RAM and or swap , no swap here and only 16 GB sticks.
              Returned back to a custom kernel 4.11.12 compiled, it build my Qt 5 cross environment fine (no tried more times) !

              Build log Ryzen build cross Qt 5 environment : https://pastebin.com/raw/YdjGY1M2

              Code:
              System Information
                     Manufacturer: Micro-Star International Co., Ltd
                     Product Name: MS-7A34
                     Version: 1.0
              
              BIOS Information
                     Vendor: American Megatrends Inc.
                     Version: 1.71
                     Release Date: 07/06/2017
                     Address: 0xF0000
                     Runtime Size: 64 kB
                     ROM Size: 16 MB
              
              ~$ sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
              
                     Speed: 2400 MT/s
                     Speed: 2400 MT/s
                     Part Number: F4-2400C15-8GVR
                     Rank: 1
                     Configured Clock Speed: 1200 MT/s
                     Speed: 2400 MT/s
                     Speed: 2400 MT/s
                     Part Number: F4-2400C15-8GVR
                     Rank: 1
                     Configured Clock Speed: 1200 MT/s
              
              ~$ uname -a
              Linux debian 4.11.12-vanilla #1 SMP Wed Aug 2 16:33:20 CEST 2017 x86_64 GNU/Linux
              
              $ cat /proc/sys/kernel/randomize_va_space
              0
              
              ~$ cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              model name      : AMD Ryzen 7 1700X Eight-Core Processor
              microcode       : 0x8001126
              scorpio810
              Junior Member
              Last edited by scorpio810; 03 August 2017, 09:55 AM.

              Comment


              • I've compiled the kernel using the segv workaround by satoru takeuchi still facing the bug
                Code:
                ./kill-ryzen.sh
                Download GCC sources
                --2017-08-02 21:57:33-- [URL="ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-7.1.0/gcc-7.1.0.tar.bz2"]ftp://ftp.fu-berlin.de/unix/language...-7.1.0.tar.bz2[/URL]
                => 'gcc-7.1.0.tar.bz2.2'
                Resolving ftp.fu-berlin.de (ftp.fu-berlin.de)... 130.133.3.130
                Connecting to ftp.fu-berlin.de (ftp.fu-berlin.de)|130.133.3.130|:21... connected.
                Logging in as anonymous ... Logged in!
                ==> SYST ... done. ==> PWD ... done.
                ==> TYPE I ... done. ==> CWD (1) /unix/languages/gcc/releases/gcc-7.1.0 ... done.
                ==> SIZE gcc-7.1.0.tar.bz2 ... 84303533
                ==> PASV ... done. ==> RETR gcc-7.1.0.tar.bz2 ... done.
                Length: 84303533 (80M) (unauthoritative)
                
                100%[================================================== ================================================== =====================================>] 84,303,533 974KB/s in 79s
                
                2017-08-02 21:58:56 (1.02 MB/s) - 'gcc-7.1.0.tar.bz2.2' saved [84303533]
                
                Extract GCC sources
                Download prerequisites
                gmp-6.1.0.tar.bz2: OK
                mpfr-3.1.4.tar.bz2: OK
                mpc-1.0.3.tar.gz: OK
                isl-0.16.1.tar.bz2: OK
                All prerequisites downloaded successfully.
                cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                model name : AMD Ryzen 5 1600 Six-Core Processor
                microcode : 0x8001126
                sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
                Speed: 2400 MHz
                Part Number: 9905678-012.A00G
                Rank: 1
                Configured Clock Speed: 2400 MHz
                uname -a
                Linux linux-x5uw 4.12.4 #1 SMP PREEMPT Sun Jul 30 17:20:38 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
                cat /proc/sys/kernel/randomize_va_space
                0
                Using 12 parallel processes
                [loop-0] Wed Aug 2 22:02:38 -03 2017 start 0
                [KERN] -- Logs begin at Sat 2017-07-29 02:31:30 -03. --
                [KERN] Aug 02 22:02:02 linux-x5uw kernel: usb 1-6: new low-speed USB device number 4 using xhci_hcd
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: new low-speed USB device number 5 using xhci_hcd
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device found, idVendor=1c4f, idProduct=0002
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Product: USB Keyboard
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Manufacturer: SIGMACHIP
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.0/0003:1C4F:0002.0005/input/input17
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0005: input,hidraw2: USB HID v1.10 Keyboard [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input0
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.1/0003:1C4F:0002.0006/input/input18
                [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0006: input,hidraw3: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input1
                [loop-1] Wed Aug 2 22:02:39 -03 2017 start 0
                [loop-2] Wed Aug 2 22:02:40 -03 2017 start 0
                [loop-3] Wed Aug 2 22:02:41 -03 2017 start 0
                [loop-4] Wed Aug 2 22:02:42 -03 2017 start 0
                [loop-5] Wed Aug 2 22:02:43 -03 2017 start 0
                [loop-6] Wed Aug 2 22:02:44 -03 2017 start 0
                [loop-7] Wed Aug 2 22:02:45 -03 2017 start 0
                [loop-8] Wed Aug 2 22:02:46 -03 2017 start 0
                [loop-9] Wed Aug 2 22:02:47 -03 2017 start 0
                [loop-10] Wed Aug 2 22:02:48 -03 2017 start 0
                [loop-11] Wed Aug 2 22:02:49 -03 2017 start 0
                [KERN] Aug 02 22:04:20 linux-x5uw kernel: sh[17778]: segfault at ffffffff894c0017 ip 00000000004712b3 sp 00003fffffff9620 error 7 in bash[400000+a6000]
                [loop-6] Wed Aug 2 22:04:20 -03 2017 build failed
                [loop-6] TIME TO FAIL: 102 s
                [loop-9] Wed Aug 2 22:04:49 -03 2017 build failed
                [loop-9] TIME TO FAIL: 131 s
                [KERN] Aug 02 22:04:49 linux-x5uw kernel: sh[24675]: segfault at 3f699950e8a8 ip 00003f69992822a0 sp 00003fffffffb408 error 4 in libc-2.22.so[3f6999203000+199000]
                Kayote
                Senior Member
                Last edited by Kayote; 02 August 2017, 09:27 PM.

                Comment


                • Am I wrong or they didn't put the CPU model in the gentoo questionnaire and so in the datasheet? Why? Is not relevant? I mean all the models are "equally" affected?
                  donbastiano
                  Junior Member
                  Last edited by donbastiano; 04 August 2017, 07:59 AM.

                  Comment


                  • Originally posted by donbastiano View Post
                    Am I wrong or they didn't put the CPU model in the gentoo questionnaire and so in the datasheet? Why? Is not relevant? I mean all the models are "equally" affected?
                    ???
                    Which CPU model are you writing about?

                    Comment


                    • Originally posted by donbastiano View Post
                      I mean all the models are "equally" affected?
                      If by models you mean R3 1200, ..., R7 1800X then no. This mostly or entirely affects models with SMT, models without SMT (1200, 1300X) are not affected.

                      Also a number of users have Ryzen CPUs which are stable even with SMT and uOP cache enabled.

                      Comment

                      Working...
                      X