pjssilva Is that from reddit? Can you paste a link to the thread?
Announcement
Collapse
No announcement yet.
Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads
Collapse
X
-
There are some discussions on redit:
There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.
I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.
Comment
-
Originally posted by pjssilva View PostThere are some discussions on redit:
There is also an active bug report in FreeBSD and DragonFlyBSD with developers looking for a workaround. In the AMD Forum we already have some cases with people with multiple machines affected. I am more and more convinced that this is a real and common bug. Unfortunately I could not convince people here to test their systems. Not a single report. Come on people, try the kill_rizen.sh script for some hours (let it running by the night). It would be great to get independent confirmation from people outside the AMD thread.
I am sorry for AMD, if this bug is widespread, even if hard to trigger, this could be a disaster for them. I hope they find a solution using microcode. But first they need to recognize the problem.
Asus B350M-A
1700 @ 3.8ghz.
Corsair LPX 2666 32 gb (2x16gb)
4.11.11-041111-generic
in ubuntu.
Comment
-
Quick!
I'd edit my own post but I haven't made enough of em it seems :-)
2017 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/sys/kernel/randomize_va_space
2
Using 16 parallel processes
[KERN] -- Logs begin at on. 2017-08-02 00:48:25 CEST. --
[KERN] aug. 02 00:50:41 oleUbuntu kernel: userif-3: sent link up event.
[KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link down event.
[KERN] aug. 02 00:50:44 oleUbuntu kernel: userif-3: sent link up event.
[KERN] aug. 02 00:50:52 oleUbuntu kernel: zram: Cannot change disksize for initialized device
[KERN] aug. 02 00:52:49 oleUbuntu kernel: zram: Cannot change disksize for initialized device
[KERN] aug. 02 00:53:27 oleUbuntu kernel: zram: Cannot change disksize for initialized device
[KERN] aug. 02 00:55:03 oleUbuntu kernel: zram0: detected capacity change from 68719476736 to 0
[KERN] aug. 02 00:56:08 oleUbuntu kernel: zram0: detected capacity change from 0 to 68719476736
[KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
[KERN] aug. 02 00:56:10 oleUbuntu kernel: EXT4-fs (zram0): mounted filesystem without journal. Opts: discard
[loop-0] on. 02. aug. 00:57:02 +0200 2017 start 0
[loop-1] on. 02. aug. 00:57:03 +0200 2017 start 0
[loop-2] on. 02. aug. 00:57:04 +0200 2017 start 0
[loop-3] on. 02. aug. 00:57:05 +0200 2017 start 0
[loop-4] on. 02. aug. 00:57:06 +0200 2017 start 0
[loop-5] on. 02. aug. 00:57:07 +0200 2017 start 0
[loop-6] on. 02. aug. 00:57:08 +0200 2017 start 0
[loop-7] on. 02. aug. 00:57:09 +0200 2017 start 0
[loop-8] on. 02. aug. 00:57:10 +0200 2017 start 0
[loop-9] on. 02. aug. 00:57:11 +0200 2017 start 0
[loop-10] on. 02. aug. 00:57:12 +0200 2017 start 0
[loop-11] on. 02. aug. 00:57:13 +0200 2017 start 0
[loop-12] on. 02. aug. 00:57:14 +0200 2017 start 0
[loop-13] on. 02. aug. 00:57:15 +0200 2017 start 0
[loop-14] on. 02. aug. 00:57:16 +0200 2017 start 0
[loop-15] on. 02. aug. 00:57:17 +0200 2017 start 0
[loop-2] on. 02. aug. 00:57:49 +0200 2017 build failed
[loop-2] TIME TO FAIL: 47 s
[loop-13] on. 02. aug. 00:58:00 +0200 2017 build failed
[loop-13] TIME TO FAIL: 58 s
[KERN] aug. 02 00:58:00 oleUbuntu kernel: bash[23093]: segfault at 7fff2c45f69c ip 00007fff2c45f69c sp 00007fff2c45f4f8 error 15
Comment
-
Originally posted by scorpio810 View Post
Thank you for the tip. ;-)
Just tried and run fine now without segfault when I build my cross environment
"make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5" on my Debian Sid.
Before I saw a lot of "segfault at 10 ip 0000000000000010 sp 00007ffcdbc8df58 error 14 in cc1plus"
I can rebuild entirely my cross environment with "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5 " without crash and only a little warning in log :Code:perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Code:[ 2621.739360] bash[10208]: segfault at 8 ip 00000000004321ac sp 00007fffffffbb20 error 6 in bash[400000+100000]
Code:[ 6326.849438] cc1plus[16495]: segfault at 10 ip 0000000000000010 sp 00007fffffffc2d8 error 14 in cc1plus[100000000+1606000] [ 6330.352677] cc1plus[16441]: segfault at 10 ip 0000000000000010 sp 00007fffffffc598 error 14 in cc1plus[100000000+1606000] [21533.790423] cc1plus[20285]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000] [21640.952023] cc1plus[23381]: segfault at 10 ip 0000000000000010 sp 00007fffffffbd18 error 14 in cc1plus[100000000+1606000]
Debian unstable on 1700X (made in Malaysia ... ) Dark Rock pro 3, MSI b350 tomahawk BIOS 1.71 beta (AGESA 1.0.0.6a) XMP2 profile but 2T command rate and RAM to1.35V
Core C6, cool and Quiet, and core boost disabled, Vcore set to 1.27V, Vsoc set to 1.10V
I saw a lot cc1plus segfault with kernel 4.12.4 with the same config file...with kernel 4.12.3
EDIT: other thing today :
Code:sudo dmesg | tail [ 5948.611360] [ 9748] 1000 9748 10452 4240 22 3 0 0 moc [ 5948.611361] [ 9749] 1000 9749 3945 1090 11 3 0 0 i686-w64-mingw3 [ 5948.611363] [ 9751] 1000 9751 5662 890 16 3 0 0 moc [ 5948.611364] [ 9754] 1000 9754 2585 59 9 3 0 0 i686-w64-mingw3 [ 5948.611365] [ 9755] 1000 9755 10247 1496 23 3 0 0 cc1plus [ 5948.611366] [ 9756] 1000 9756 3945 1090 12 3 0 0 i686-w64-mingw3 [ 5948.611367] Out of memory: Kill process 8865 (cc1plus) score 25 or sacrifice child [ 5948.611372] Killed process 8865 (cc1plus) total-vm:474364kB, anon-rss:412300kB, file-rss:0kB, shmem-rss:0kB [ 5951.892679] cc1plus[9010]: segfault at 10 ip 0000000000000010 sp 00007fffffffcb18 error 14 in cc1plus[100000000+15b7000] [ 5975.224183] cc1plus[10188]: segfault at 10 ip 0000000000000010 sp 00007fffffffca98 error 14 in cc1plus[100000000+15b7000] [ 540.648393] traps: ld[7488] general protection ip:7f815e02bc72 sp:7fffffffdd00 error:0 in libbfd-2.28-system.so[7f815dfa6000+129000] [ 2030.912805] cc1plus[12796]: segfault at 10 ip 0000000000000010 sp 00007fffffffc618 error 14 in cc1plus[100000000+1606000]
Last edited by scorpio810; 04 August 2017, 07:49 AM.
Comment
-
oleyska Thanks for the report. scorpio810 I got a little confused by your post, did you try the kill_rizen.sh test I suggest. It is very reliable to spot systems with problems. Just let it run for some hours.
I have also come across an interesting post on Gentoo Wiki. If you go to their Ryzen page there is a troubleshooting section that comments about this compilation problem (https://wiki.gentoo.org/wiki/Ryzen#Troubleshooting). There you can find a link for a datasheet of the result of a questionnaire answered by more than 60 Gentoo users about Ryzen. From what I can see more than 50% report problems in stability (there is a column for that). I think that is huge!
I would be very nice if phoronix to try the test in his systems and report. We need to get serious attention on this and I believe that a Phoronix article is possibly the best way.
- Likes 1
Comment
-
@pjssilva I can't ! kill_rizen.sh need > 16 GB RAM and or swap , no swap here and only 16 GB sticks.
Returned back to a custom kernel 4.11.12 compiled, it build my Qt 5 cross environment fine (no tried more times) !
Build log Ryzen build cross Qt 5 environment : https://pastebin.com/raw/YdjGY1M2
Code:System Information Manufacturer: Micro-Star International Co., Ltd Product Name: MS-7A34 Version: 1.0 BIOS Information Vendor: American Megatrends Inc. Version: 1.71 Release Date: 07/06/2017 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 16 MB ~$ sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown Speed: 2400 MT/s Speed: 2400 MT/s Part Number: F4-2400C15-8GVR Rank: 1 Configured Clock Speed: 1200 MT/s Speed: 2400 MT/s Speed: 2400 MT/s Part Number: F4-2400C15-8GVR Rank: 1 Configured Clock Speed: 1200 MT/s ~$ uname -a Linux debian 4.11.12-vanilla #1 SMP Wed Aug 2 16:33:20 CEST 2017 x86_64 GNU/Linux $ cat /proc/sys/kernel/randomize_va_space 0 ~$ cat /proc/cpuinfo | grep -i -E "(model name|microcode)" model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126 model name : AMD Ryzen 7 1700X Eight-Core Processor microcode : 0x8001126
Last edited by scorpio810; 03 August 2017, 09:55 AM.
Comment
-
I've compiled the kernel using the segv workaround by satoru takeuchi still facing the bug
Code:./kill-ryzen.sh Download GCC sources --2017-08-02 21:57:33-- [URL="ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-7.1.0/gcc-7.1.0.tar.bz2"]ftp://ftp.fu-berlin.de/unix/language...-7.1.0.tar.bz2[/URL] => 'gcc-7.1.0.tar.bz2.2' Resolving ftp.fu-berlin.de (ftp.fu-berlin.de)... 130.133.3.130 Connecting to ftp.fu-berlin.de (ftp.fu-berlin.de)|130.133.3.130|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /unix/languages/gcc/releases/gcc-7.1.0 ... done. ==> SIZE gcc-7.1.0.tar.bz2 ... 84303533 ==> PASV ... done. ==> RETR gcc-7.1.0.tar.bz2 ... done. Length: 84303533 (80M) (unauthoritative) 100%[================================================== ================================================== =====================================>] 84,303,533 974KB/s in 79s 2017-08-02 21:58:56 (1.02 MB/s) - 'gcc-7.1.0.tar.bz2.2' saved [84303533] Extract GCC sources Download prerequisites gmp-6.1.0.tar.bz2: OK mpfr-3.1.4.tar.bz2: OK mpc-1.0.3.tar.gz: OK isl-0.16.1.tar.bz2: OK All prerequisites downloaded successfully. cat /proc/cpuinfo | grep -i -E "(model name|microcode)" model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 model name : AMD Ryzen 5 1600 Six-Core Processor microcode : 0x8001126 sudo dmidecode -t memory | grep -i -E "(rank|speed|part)" | grep -v -i unknown Speed: 2400 MHz Part Number: 9905678-012.A00G Rank: 1 Configured Clock Speed: 2400 MHz uname -a Linux linux-x5uw 4.12.4 #1 SMP PREEMPT Sun Jul 30 17:20:38 -03 2017 x86_64 x86_64 x86_64 GNU/Linux cat /proc/sys/kernel/randomize_va_space 0 Using 12 parallel processes [loop-0] Wed Aug 2 22:02:38 -03 2017 start 0 [KERN] -- Logs begin at Sat 2017-07-29 02:31:30 -03. -- [KERN] Aug 02 22:02:02 linux-x5uw kernel: usb 1-6: new low-speed USB device number 4 using xhci_hcd [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: new low-speed USB device number 5 using xhci_hcd [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device found, idVendor=1c4f, idProduct=0002 [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Product: USB Keyboard [KERN] Aug 02 22:02:04 linux-x5uw kernel: usb 1-6: Manufacturer: SIGMACHIP [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.0/0003:1C4F:0002.0005/input/input17 [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0005: input,hidraw2: USB HID v1.10 Keyboard [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input0 [KERN] Aug 02 22:02:04 linux-x5uw kernel: input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-6/1-6:1.1/0003:1C4F:0002.0006/input/input18 [KERN] Aug 02 22:02:04 linux-x5uw kernel: hid-generic 0003:1C4F:0002.0006: input,hidraw3: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:03:00.0-6/input1 [loop-1] Wed Aug 2 22:02:39 -03 2017 start 0 [loop-2] Wed Aug 2 22:02:40 -03 2017 start 0 [loop-3] Wed Aug 2 22:02:41 -03 2017 start 0 [loop-4] Wed Aug 2 22:02:42 -03 2017 start 0 [loop-5] Wed Aug 2 22:02:43 -03 2017 start 0 [loop-6] Wed Aug 2 22:02:44 -03 2017 start 0 [loop-7] Wed Aug 2 22:02:45 -03 2017 start 0 [loop-8] Wed Aug 2 22:02:46 -03 2017 start 0 [loop-9] Wed Aug 2 22:02:47 -03 2017 start 0 [loop-10] Wed Aug 2 22:02:48 -03 2017 start 0 [loop-11] Wed Aug 2 22:02:49 -03 2017 start 0 [KERN] Aug 02 22:04:20 linux-x5uw kernel: sh[17778]: segfault at ffffffff894c0017 ip 00000000004712b3 sp 00003fffffff9620 error 7 in bash[400000+a6000] [loop-6] Wed Aug 2 22:04:20 -03 2017 build failed [loop-6] TIME TO FAIL: 102 s [loop-9] Wed Aug 2 22:04:49 -03 2017 build failed [loop-9] TIME TO FAIL: 131 s [KERN] Aug 02 22:04:49 linux-x5uw kernel: sh[24675]: segfault at 3f699950e8a8 ip 00003f69992822a0 sp 00003fffffffb408 error 4 in libc-2.22.so[3f6999203000+199000]
Last edited by Kayote; 02 August 2017, 09:27 PM.
Comment
-
Am I wrong or they didn't put the CPU model in the gentoo questionnaire and so in the datasheet? Why? Is not relevant? I mean all the models are "equally" affected?Last edited by donbastiano; 04 August 2017, 07:59 AM.
Comment
-
Comment