Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
    Kayote
    Senior Member

  • Kayote
    replied
    I've compiled the kernel using the segv workaround by satoru takeuchi still facing the bug
    Code:
    ./kill-ryzen.sh
    Download GCC sources
    --2017-08-02 21:57:33-- [URL="ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-7.1.0/gcc-7.1.0.tar.bz2"]ftp://ftp.fu-berlin.de/unix/language...-7.1.0.tar.bz2[/URL]
    => 'gcc-7.1.0.tar.bz2.2'
    Resolving ftp.fu-berlin.de (ftp.fu-berlin.de)... 130.133.3.130
    Connecting to ftp.fu-berlin.de (ftp.fu-berlin.de)|130.133.3.130|:21... connected.
    Logging in as anonymous ... Logged in!
    ==> SYST ... done. ==> PWD ... done.
    ==> TYPE I ... done. ==> CWD (1) /unix/languages/gcc/releases/gcc-7.1.0 ... done.
    ==> SIZE gcc-7.1.0.tar.bz2 ... 84303533
    ==> PASV ... done. ==> RETR gcc-7.1.0.tar.bz2 ... done.
    Length: 84303533 (80M) (unauthoritative)
    
    100%[================================================== ================================================== =====================================>] 84,303,533 974KB/s in 79s
    
    2017-08-02 21:58:56 (1.02 MB/s) - 'gcc-7.1.0.tar.bz2.2' saved [84303533]
    
    Extract GCC sources
    Download prerequisites
    gmp-6.1.0.tar.bz2: OK
    mpfr-3.1.4.tar.bz2: OK
    mpc-1.0.3.tar.gz: OK
    isl-0.16.1.tar.bz2: OK
    All prerequisites downloaded successfully.
    cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    model name : AMD Ryzen 5 1600 Six-Core Processor
    microcode : 0x8001126
    sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
    Speed: 2400 MHz
    Part Number: 9905678-012.A00G
    Rank: 1
    Configured Clock Speed: 2400 MHz
    uname -a
    Linux linux-x5uw 4.12.4 #1 SMP PREEMPT Sun Jul 30 17:20:38 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
    cat /proc/sys/kernel/randomize_va_space
    0
    Using 12 parallel processes
    [loop-0] Wed Aug 2 22:02:38 -03 2017 start 0
    [KERN] -- Logs begin at Sat 2017-07-29 02:31:30 -03. --
    [KERN] Aug 02 22:02:02 linux-x5uw kernel: usb 1-6: new low-speed USB device number 4 using xhci_hcd
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: new low-speed USB device number 5 using xhci_hcd
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device found, idVendor=1c4f, idProduct=0002
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Product: USB Keyboard
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Manufacturer: SIGMACHIP
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.0/0003:1C4F:0002.0005/input/input17
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0005: input,hidraw2: USB HID v1.10 Keyboard [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input0
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.1/0003:1C4F:0002.0006/input/input18
    [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0006: input,hidraw3: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input1
    [loop-1] Wed Aug 2 22:02:39 -03 2017 start 0
    [loop-2] Wed Aug 2 22:02:40 -03 2017 start 0
    [loop-3] Wed Aug 2 22:02:41 -03 2017 start 0
    [loop-4] Wed Aug 2 22:02:42 -03 2017 start 0
    [loop-5] Wed Aug 2 22:02:43 -03 2017 start 0
    [loop-6] Wed Aug 2 22:02:44 -03 2017 start 0
    [loop-7] Wed Aug 2 22:02:45 -03 2017 start 0
    [loop-8] Wed Aug 2 22:02:46 -03 2017 start 0
    [loop-9] Wed Aug 2 22:02:47 -03 2017 start 0
    [loop-10] Wed Aug 2 22:02:48 -03 2017 start 0
    [loop-11] Wed Aug 2 22:02:49 -03 2017 start 0
    [KERN] Aug 02 22:04:20 linux-x5uw kernel: sh[17778]: segfault at ffffffff894c0017 ip 00000000004712b3 sp 00003fffffff9620 error 7 in bash[400000+a6000]
    [loop-6] Wed Aug 2 22:04:20 -03 2017 build failed
    [loop-6] TIME TO FAIL: 102 s
    [loop-9] Wed Aug 2 22:04:49 -03 2017 build failed
    [loop-9] TIME TO FAIL: 131 s
    [KERN] Aug 02 22:04:49 linux-x5uw kernel: sh[24675]: segfault at 3f699950e8a8 ip 00003f69992822a0 sp 00003fffffffb408 error 4 in libc-2.22.so[3f6999203000+199000]
    Kayote
    Senior Member
    Last edited by Kayote; 02 August 2017, 09:27 PM.

    Leave a comment:

  • scorpio810
    Junior Member

  • scorpio810
    replied
    @pjssilva I can't ! kill_rizen.sh need > 16 GB RAM and or swap , no swap here and only 16 GB sticks.
    Returned back to a custom kernel 4.11.12 compiled, it build my Qt 5 cross environment fine (no tried more times) !

    Build log Ryzen build cross Qt 5 environment : https://pastebin.com/raw/YdjGY1M2

    Code:
    System Information
           Manufacturer: Micro-Star International Co., Ltd
           Product Name: MS-7A34
           Version: 1.0
    
    BIOS Information
           Vendor: American Megatrends Inc.
           Version: 1.71
           Release Date: 07/06/2017
           Address: 0xF0000
           Runtime Size: 64 kB
           ROM Size: 16 MB
    
    ~$ sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown
    
           Speed: 2400 MT/s
           Speed: 2400 MT/s
           Part Number: F4-2400C15-8GVR
           Rank: 1
           Configured Clock Speed: 1200 MT/s
           Speed: 2400 MT/s
           Speed: 2400 MT/s
           Part Number: F4-2400C15-8GVR
           Rank: 1
           Configured Clock Speed: 1200 MT/s
    
    ~$ uname -a
    Linux debian 4.11.12-vanilla #1 SMP Wed Aug 2 16:33:20 CEST 2017 x86_64 GNU/Linux
    
    $ cat /proc/sys/kernel/randomize_va_space
    0
    
    ~$ cat /proc/cpuinfo | grep -i -E "(model name|microcode)"
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    model name      : AMD Ryzen 7 1700X Eight-Core Processor
    microcode       : 0x8001126
    scorpio810
    Junior Member
    Last edited by scorpio810; 03 August 2017, 09:55 AM.

    Leave a comment:

  • pjssilva
    Junior Member

  • pjssilva
    replied
    oleyska
    Senior Member
    oleyska Thanks for the report.
    scorpio810
    Junior Member
    scorpio810 I got a little confused by your post, did you try the kill_rizen.sh test I suggest. It is very reliable to spot systems with problems. Just let it run for some hours.

    I have also come across an interesting post on Gentoo Wiki. If you go to their Ryzen page there is a troubleshooting section that comments about this compilation problem (https://wiki.gentoo.org/wiki/Ryzen#Troubleshooting). There you can find a link for a datasheet of the result of a questionnaire answered by more than 60 Gentoo users about Ryzen. From what I can see more than 50% report problems in stability (there is a column for that). I think that is huge!

    I would be very nice if
    phoronix
    Administrator
    phoronix to try the test in his systems and report. We need to get serious attention on this and I believe that a Phoronix article is possibly the best way.

    Leave a comment:

  • scorpio810
    Junior Member

  • scorpio810
    replied
    Originally posted by scorpio810 View Post

    Thank you for the tip. ;-)
    Just tried and run fine now without segfault when I build my cross environment
    "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5" on my Debian Sid.
    Before I saw a lot of "segfault at 10 ip 0000000000000010 sp 00007ffcdbc8df58 error 14 in cc1plus"
    That very very strange or black magic !

    I can rebuild entirely my cross environment with "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5 " without crash and only a little warning in log :
    Code:
    perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
    But sometimes it crash again like bash but finish the job and build fine after...
    Code:
    [ 2621.739360] bash[10208]: segfault at 8 ip 00000000004321ac sp 00007fffffffbb20 error 6 in bash[400000+100000]
    or in cc1plus segfault crash again and again and need finish compile by make --jobs=8 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5
    Code:
    [ 6326.849438] cc1plus[16495]: segfault at 10 ip 0000000000000010 sp 00007fffffffc2d8 error 14 in cc1plus[100000000+1606000]
    [ 6330.352677] cc1plus[16441]: segfault at 10 ip 0000000000000010 sp 00007fffffffc598 error 14 in cc1plus[100000000+1606000]
    [21533.790423] cc1plus[20285]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
    [21640.952023] cc1plus[23381]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
    norandmaps added to grub can't help now .. on my custom kernel 4.12.3, and 4.12.4 (kernel.org)
    Debian unstable on 1700X (made in Malaysia ... ) Dark Rock pro 3, MSI b350 tomahawk BIOS 1.71 beta (AGESA 1.0.0.6a) XMP2 profile but 2T command rate and RAM to1.35V
    Core C6, cool and Quiet, and core boost disabled, Vcore set to 1.27V, Vsoc set to 1.10V

    I saw a lot cc1plus segfault with kernel 4.12.4 with the same config file...with kernel 4.12.3

    EDIT: other thing today :


    Code:
    sudo dmesg | tail
    
    [ 5948.611360] [ 9748]  1000  9748    10452     4240      22       3        0             0 moc
    [ 5948.611361] [ 9749]  1000  9749     3945     1090      11       3        0             0 i686-w64-mingw3
    [ 5948.611363] [ 9751]  1000  9751     5662      890      16       3        0             0 moc
    [ 5948.611364] [ 9754]  1000  9754     2585       59       9       3        0             0 i686-w64-mingw3
    [ 5948.611365] [ 9755]  1000  9755    10247     1496      23       3        0             0 cc1plus
    [ 5948.611366] [ 9756]  1000  9756     3945     1090      12       3        0             0 i686-w64-mingw3
    [ 5948.611367] Out of memory: Kill process 8865 (cc1plus) score 25 or sacrifice child
    [ 5948.611372] Killed process 8865 (cc1plus) total-vm:474364kB, anon-rss:412300kB, file-rss:0kB, shmem-rss:0kB
    [ 5951.892679] cc1plus[9010]: segfault at 10 ip 0000000000000010 sp 00007fffffffcb18 error 14 in cc1plus[100000000+15b7000]
    [ 5975.224183] cc1plus[10188]: segfault at 10 ip 0000000000000010 sp 00007fffffffca98 error 14 in cc1plus[100000000+15b7000]
    
    
    [  540.648393] traps: ld[7488] general protection ip:7f815e02bc72 sp:7fffffffdd00 error:0 in libbfd-2.28-system.so[7f815dfa6000+129000]
    [ 2030.912805] cc1plus[12796]: segfault at 10 ip 0000000000000010 sp 00007fffffffc618 error 14 in cc1plus[100000000+1606000]
    scorpio810
    Junior Member
    Last edited by scorpio810; 04 August 2017, 07:49 AM.

    Leave a comment:

  • oleyska
    Senior Member

  • oleyska
    replied
    Quick!
    I'd edit my own post but I haven't made enough of em it seems :-)

    2017 x86_64 x86_64 x86_64 GNU/Linux
    cat /proc/sys/kernel/randomize_va_space
    2
    Using 16 parallel processes
    [KERN] -- Logs begin at on. 2017-08-02 00:48:25 CEST. --
    [KERN] aug. 02 00:50:41 oleUbuntu kernel: userif-3: sent link up event.
    [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link down event.
    [KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link up event.
    [KERN] aug. 02 00:50:52 oleUbuntu kernel: zram: Cannot change disksize for initialized device
    [KERN] aug. 02 00:52:49 oleUbuntu kernel: zram: Cannot change disksize for initialized device
    [KERN] aug. 02 00:53:27 oleUbuntu kernel: zram: Cannot change disksize for initialized device
    [KERN] aug. 02 00:55:03 oleUbuntu kernel: zram0: detected capacity change from 68719476736 to 0
    [KERN] aug. 02 00:56:08 oleUbuntu kernel: zram0: detected capacity change from 0 to 68719476736
    [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
    [KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounted filesystem without journal. Opts: discard
    [loop-0] on. 02. aug. 00:57:02 +0200 2017 start 0
    [loop-1] on. 02. aug. 00:57:03 +0200 2017 start 0
    [loop-2] on. 02. aug. 00:57:04 +0200 2017 start 0
    [loop-3] on. 02. aug. 00:57:05 +0200 2017 start 0
    [loop-4] on. 02. aug. 00:57:06 +0200 2017 start 0
    [loop-5] on. 02. aug. 00:57:07 +0200 2017 start 0
    [loop-6] on. 02. aug. 00:57:08 +0200 2017 start 0
    [loop-7] on. 02. aug. 00:57:09 +0200 2017 start 0
    [loop-8] on. 02. aug. 00:57:10 +0200 2017 start 0
    [loop-9] on. 02. aug. 00:57:11 +0200 2017 start 0
    [loop-10] on. 02. aug. 00:57:12 +0200 2017 start 0
    [loop-11] on. 02. aug. 00:57:13 +0200 2017 start 0
    [loop-12] on. 02. aug. 00:57:14 +0200 2017 start 0
    [loop-13] on. 02. aug. 00:57:15 +0200 2017 start 0
    [loop-14] on. 02. aug. 00:57:16 +0200 2017 start 0
    [loop-15] on. 02. aug. 00:57:17 +0200 2017 start 0
    [loop-2] on. 02. aug. 00:57:49 +0200 2017 build failed
    [loop-2] TIME TO FAIL: 47 s
    [loop-13] on. 02. aug. 00:58:00 +0200 2017 build failed
    [loop-13] TIME TO FAIL: 58 s
    [KERN] aug. 02 00:58:00 oleUbuntu kernel: bash[23093]: segfault at 7fff2c45f69c ip 00007fff2c45f69c sp 00007fff2c45f4f8 error 15

    Leave a comment:

  • oleyska
    Senior Member

  • oleyska
    replied
    Originally posted by pjssilva View Post
    There are some discussions on redit:

    https://www.reddit.com/r/programming...ausing_random/

    There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

    I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.
    I am on it, started it now.
    Asus B350M-A
    1700 @ 3.8ghz.
    Corsair LPX 2666 32 gb (2x16gb)

    4.11.11-041111-generic
    in ubuntu.

    Leave a comment:

  • pjssilva
    Junior Member

  • pjssilva
    replied
    There are some discussions on redit:

    https://www.reddit.com/r/programming...ausing_random/

    There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.

    I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.

    Leave a comment:

  • Zucca
    Senior Member

  • Zucca
    replied
    pjssilva
    Junior Member
    pjssilva Is that from reddit? Can you paste a link to the thread?

    Leave a comment:

  • pjssilva
    Junior Member

  • pjssilva
    replied
    stevea: Actually a few days ago a AMD representative posted in thread saying "we are still reading every post AFAIK and this is definitely being looked at. Please continue to file customer tickets as amdmatt suggested.". So the current situation is that AMD is taking a look at it but does not share any internal information. So we do not know for example whether they are abale to replicate our problems in place. Many of us opened personal technical support request, myself included. Some have exchanged the CPU using a RMA. In the tread we found some people saying that the new CPU solved their problem but some also saying that the problem remained the same. This last cases scared many of us, since we started to think how unlikely would be to get a second faulty CPU. This may suggest that the problem is more usual than we initially thought. This was the main reason the led me to write to message here. We would like people that do not think that they are affected to test their system. If we can get many responses saying that the systems are OK them our best option is to ask for an RMA. If many people discover problems in their system, then the bug might be wide spread. So, once again, if some fellow reader could test their systems and post a follow up here (just sayin that everything is OK or not) we would appreciate it.

    Leave a comment:

  • stevea
    Junior Member

  • stevea
    replied
    Very disturbing that there is no official AMD communication on the issue for a very long time. There was a "we're working on it" message on the AMD community forum many weeks ago. Do they intend to just sweep this under the rug ? This sort of problem makes my intended use (24x7 Linux server) infeasible, and leaves a very bad impression of AMD as CPU vendor.

    Leave a comment:

Working...
X