Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pjssilva Is that from reddit? Can you paste a link to the thread?

    Comment


    • There are some discussions on redit:



      There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

      I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.

      Comment


      • Originally posted by pjssilva View Post
        There are some discussions on redit:



        There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

        I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.
        I am on it, started it now.
        Asus B350M-A
        1700 @ 3.8ghz.
        Corsair LPX 2666 32 gb (2x16gb)

        4.11.11-041111-generic
        in ubuntu.

        Comment


        • Quick!
          I'd edit my own post but I haven't made enough of em it seems :-)

          2017 x86_64 x86_64 x86_64 GNU/Linux
          cat /proc/sys/kernel/randomize_va_space
          2
          Using 16 parallel processes
          [KERN] -- Logs begin at on. 2017-08-02 00:48:25 CEST. --
          [KERN] aug. 02 00:50:41 oleUbuntu kernel: userif-3: sent link up event.
          [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link down event.
          [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link up event.
          [KERN] aug. 02 00:50:52 oleUbuntu kernel: zram: Cannot change disksize for initialized device
          [KERN] aug. 02 00:52:49 oleUbuntu kernel: zram: Cannot change disksize for initialized device
          [KERN] aug. 02 00:53:27 oleUbuntu kernel: zram: Cannot change disksize for initialized device
          [KERN] aug. 02 00:55:03 oleUbuntu kernel: zram0: detected capacity change from 68719476736 to 0
          [KERN] aug. 02 00:56:08 oleUbuntu kernel: zram0: detected capacity change from 0 to 68719476736
          [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
          [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounted filesystem without journal. Opts: discard
          [loop-0] on. 02. aug. 00:57:02 +0200 2017 start 0
          [loop-1] on. 02. aug. 00:57:03 +0200 2017 start 0
          [loop-2] on. 02. aug. 00:57:04 +0200 2017 start 0
          [loop-3] on. 02. aug. 00:57:05 +0200 2017 start 0
          [loop-4] on. 02. aug. 00:57:06 +0200 2017 start 0
          [loop-5] on. 02. aug. 00:57:07 +0200 2017 start 0
          [loop-6] on. 02. aug. 00:57:08 +0200 2017 start 0
          [loop-7] on. 02. aug. 00:57:09 +0200 2017 start 0
          [loop-8] on. 02. aug. 00:57:10 +0200 2017 start 0
          [loop-9] on. 02. aug. 00:57:11 +0200 2017 start 0
          [loop-10] on. 02. aug. 00:57:12 +0200 2017 start 0
          [loop-11] on. 02. aug. 00:57:13 +0200 2017 start 0
          [loop-12] on. 02. aug. 00:57:14 +0200 2017 start 0
          [loop-13] on. 02. aug. 00:57:15 +0200 2017 start 0
          [loop-14] on. 02. aug. 00:57:16 +0200 2017 start 0
          [loop-15] on. 02. aug. 00:57:17 +0200 2017 start 0
          [loop-2] on. 02. aug. 00:57:49 +0200 2017 build failed
          [loop-2] TIME TO FAIL: 47 s
          [loop-13] on. 02. aug. 00:58:00 +0200 2017 build failed
          [loop-13] TIME TO FAIL: 58 s
          [KERN] aug. 02 00:58:00 oleUbuntu kernel: bash[23093]: segfault at 7fff2c45f69c ip 00007fff2c45f69c sp 00007fff2c45f4f8 error 15

          Comment


          • Originally posted by scorpio810 View Post

            Thank you for the tip. ;-)
            Just tried and run fine now without segfault when I build my cross environment
            "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5" on my Debian Sid.
            Before I saw a lot of "segfault at 10 ip 0000000000000010 sp 00007ffcdbc8df58 error 14 in cc1plus"
            That very very strange or black magic !

            I can rebuild entirely my cross environment with "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5 " without crash and only a little warning in log :
            Code:
            perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
            But sometimes it crash again like bash but finish the job and build fine after...
            Code:
            [ 2621.739360] bash[10208]: segfault at 8 ip 00000000004321ac sp 00007fffffffbb20 error 6 in bash[400000+100000]
            or in cc1plus segfault crash again and again and need finish compile by make --jobs=8 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5
            Code:
            [ 6326.849438] cc1plus[16495]: segfault at 10 ip 0000000000000010 sp 00007fffffffc2d8 error 14 in cc1plus[100000000+1606000]
            [ 6330.352677] cc1plus[16441]: segfault at 10 ip 0000000000000010 sp 00007fffffffc598 error 14 in cc1plus[100000000+1606000]
            [21533.790423] cc1plus[20285]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
            [21640.952023] cc1plus[23381]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
            norandmaps added to grub can't help now .. on my custom kernel 4.12.3, and 4.12.4 (kernel.org)
            Debian unstable on 1700X (made in Malaysia ... ) Dark Rock pro 3, MSI b350 tomahawk BIOS 1.71 beta (AGESA 1.0.0.6a) XMP2 profile but 2T command rate and RAM to1.35V
            Core C6, cool and Quiet, and core boost disabled, Vcore set to 1.27V, Vsoc set to 1.10V

            I saw a lot cc1plus segfault with kernel 4.12.4 with the same config file...with kernel 4.12.3

            EDIT: other thing today :


            Code:
            sudo dmesg | tail
            
            [ 5948.611360] [ 9748]  1000  9748    10452     4240      22       3        0             0 moc
            [ 5948.611361] [ 9749]  1000  9749     3945     1090      11       3        0             0 i686-w64-mingw3
            [ 5948.611363] [ 9751]  1000  9751     5662      890      16       3        0             0 moc
            [ 5948.611364] [ 9754]  1000  9754     2585       59       9       3        0             0 i686-w64-mingw3
            [ 5948.611365] [ 9755]  1000  9755    10247     1496      23       3        0             0 cc1plus
            [ 5948.611366] [ 9756]  1000  9756     3945     1090      12       3        0             0 i686-w64-mingw3
            [ 5948.611367] Out of memory: Kill process 8865 (cc1plus) score 25 or sacrifice child
            [ 5948.611372] Killed process 8865 (cc1plus) total-vm:474364kB, anon-rss:412300kB, file-rss:0kB, shmem-rss:0kB
            [ 5951.892679] cc1plus[9010]: segfault at 10 ip 0000000000000010 sp 00007fffffffcb18 error 14 in cc1plus[100000000+15b7000]
            [ 5975.224183] cc1plus[10188]: segfault at 10 ip 0000000000000010 sp 00007fffffffca98 error 14 in cc1plus[100000000+15b7000]
            
            
            [  540.648393] traps: ld[7488] general protection ip:7f815e02bc72 sp:7fffffffdd00 error:0 in libbfd-2.28-system.so[7f815dfa6000+129000]
            [ 2030.912805] cc1plus[12796]: segfault at 10 ip 0000000000000010 sp 00007fffffffc618 error 14 in cc1plus[100000000+1606000]
            Last edited by scorpio810; 04 August 2017, 07:49 AM.

            Comment


            • oleyska Thanks for the report. scorpio810 I got a little confused by your post, did you try the kill_rizen.sh test I suggest. It is very reliable to spot systems with problems. Just let it run for some hours.

              I have also come across an interesting post on Gentoo Wiki. If you go to their Ryzen page there is a troubleshooting section that comments about this compilation problem (https://wiki.gentoo.org/wiki/Ryzen#Troubleshooting). There you can find a link for a datasheet of the result of a questionnaire answered by more than 60 Gentoo users about Ryzen. From what I can see more than 50% report problems in stability (there is a column for that). I think that is huge!

              I would be very nice if phoronix to try the test in his systems and report. We need to get serious attention on this and I believe that a Phoronix article is possibly the best way.

              Comment


              • @pjssilva I can't ! kill_rizen.sh need > 16 GB RAM and or swap , no swap here and only 16 GB sticks.
                Returned back to a custom kernel 4.11.12 compiled, it build my Qt 5 cross environment fine (no tried more times) !

                Build log Ryzen build cross Qt 5 environment : https://pastebin.com/raw/YdjGY1M2

                Code:
                System Information
                       Manufacturer: Micro-Star International Co., Ltd
                       Product Name: MS-7A34
                       Version: 1.0
                
                BIOS Information
                       Vendor: American Megatrends Inc.
                       Version: 1.71
                       Release Date: 07/06/2017
                       Address: 0xF0000
                       Runtime Size: 64 kB
                       ROM Size: 16 MB
                
                ~$ sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
                
                       Speed: 2400 MT/s
                       Speed: 2400 MT/s
                       Part Number: F4-2400C15-8GVR
                       Rank: 1
                       Configured Clock Speed: 1200 MT/s
                       Speed: 2400 MT/s
                       Speed: 2400 MT/s
                       Part Number: F4-2400C15-8GVR
                       Rank: 1
                       Configured Clock Speed: 1200 MT/s
                
                ~$ uname -a
                Linux debian 4.11.12-vanilla #1 SMP Wed Aug 2 16:33:20 CEST 2017 x86_64 GNU/Linux
                
                $ cat /proc/sys/kernel/randomize_va_space
                0
                
                ~$ cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                model name      : AMD Ryzen 7 1700X Eight-Core Processor
                microcode       : 0x8001126
                Last edited by scorpio810; 03 August 2017, 09:55 AM.

                Comment


                • I've compiled the kernel using the segv workaround by satoru takeuchi still facing the bug
                  Code:
                  ./kill-ryzen.sh
                  Download GCC sources
                  --2017-08-02 21:57:33-- [URL="ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-7.1.0/gcc-7.1.0.tar.bz2"]ftp://ftp.fu-berlin.de/unix/language...-7.1.0.tar.bz2[/URL]
                  => 'gcc-7.1.0.tar.bz2.2'
                  Resolving ftp.fu-berlin.de (ftp.fu-berlin.de)... 130.133.3.130
                  Connecting to ftp.fu-berlin.de (ftp.fu-berlin.de)|130.133.3.130|:21... connected.
                  Logging in as anonymous ... Logged in!
                  ==> SYST ... done. ==> PWD ... done.
                  ==> TYPE I ... done. ==> CWD (1) /unix/languages/gcc/releases/gcc-7.1.0 ... done.
                  ==> SIZE gcc-7.1.0.tar.bz2 ... 84303533
                  ==> PASV ... done. ==> RETR gcc-7.1.0.tar.bz2 ... done.
                  Length: 84303533 (80M) (unauthoritative)
                  
                  100%[================================================== ================================================== =====================================>] 84,303,533 974KB/s in 79s
                  
                  2017-08-02 21:58:56 (1.02 MB/s) - 'gcc-7.1.0.tar.bz2.2' saved [84303533]
                  
                  Extract GCC sources
                  Download prerequisites
                  gmp-6.1.0.tar.bz2: OK
                  mpfr-3.1.4.tar.bz2: OK
                  mpc-1.0.3.tar.gz: OK
                  isl-0.16.1.tar.bz2: OK
                  All prerequisites downloaded successfully.
                  cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  model name : AMD Ryzen 5 1600 Six-Core Processor
                  microcode : 0x8001126
                  sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
                  Speed: 2400 MHz
                  Part Number: 9905678-012.A00G
                  Rank: 1
                  Configured Clock Speed: 2400 MHz
                  uname -a
                  Linux linux-x5uw 4.12.4 #1 SMP PREEMPT Sun Jul 30 17:20:38 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
                  cat /proc/sys/kernel/randomize_va_space
                  0
                  Using 12 parallel processes
                  [loop-0] Wed Aug 2 22:02:38 -03 2017 start 0
                  [KERN] -- Logs begin at Sat 2017-07-29 02:31:30 -03. --
                  [KERN] Aug 02 22:02:02 linux-x5uw kernel: usb 1-6: new low-speed USB device number 4 using xhci_hcd
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: new low-speed USB device number 5 using xhci_hcd
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device found, idVendor=1c4f, idProduct=0002
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Product: USB Keyboard
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Manufacturer: SIGMACHIP
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.0/0003:1C4F:0002.0005/input/input17
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0005: input,hidraw2: USB HID v1.10 Keyboard [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input0
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.1/0003:1C4F:0002.0006/input/input18
                  [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0006: input,hidraw3: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input1
                  [loop-1] Wed Aug 2 22:02:39 -03 2017 start 0
                  [loop-2] Wed Aug 2 22:02:40 -03 2017 start 0
                  [loop-3] Wed Aug 2 22:02:41 -03 2017 start 0
                  [loop-4] Wed Aug 2 22:02:42 -03 2017 start 0
                  [loop-5] Wed Aug 2 22:02:43 -03 2017 start 0
                  [loop-6] Wed Aug 2 22:02:44 -03 2017 start 0
                  [loop-7] Wed Aug 2 22:02:45 -03 2017 start 0
                  [loop-8] Wed Aug 2 22:02:46 -03 2017 start 0
                  [loop-9] Wed Aug 2 22:02:47 -03 2017 start 0
                  [loop-10] Wed Aug 2 22:02:48 -03 2017 start 0
                  [loop-11] Wed Aug 2 22:02:49 -03 2017 start 0
                  [KERN] Aug 02 22:04:20 linux-x5uw kernel: sh[17778]: segfault at ffffffff894c0017 ip 00000000004712b3 sp 00003fffffff9620 error 7 in bash[400000+a6000]
                  [loop-6] Wed Aug 2 22:04:20 -03 2017 build failed
                  [loop-6] TIME TO FAIL: 102 s
                  [loop-9] Wed Aug 2 22:04:49 -03 2017 build failed
                  [loop-9] TIME TO FAIL: 131 s
                  [KERN] Aug 02 22:04:49 linux-x5uw kernel: sh[24675]: segfault at 3f699950e8a8 ip 00003f69992822a0 sp 00003fffffffb408 error 4 in libc-2.22.so[3f6999203000+199000]
                  Last edited by Kayote; 02 August 2017, 09:27 PM.

                  Comment


                  • Am I wrong or they didn't put the CPU model in the gentoo questionnaire and so in the datasheet? Why? Is not relevant? I mean all the models are "equally" affected?
                    Last edited by donbastiano; 04 August 2017, 07:59 AM.

                    Comment


                    • Originally posted by donbastiano View Post
                      Am I wrong or they didn't put the CPU model in the gentoo questionnaire and so in the datasheet? Why? Is not relevant? I mean all the models are "equally" affected?
                      ???
                      Which CPU model are you writing about?

                      Comment

                      Working...
                      X