Originally posted by rene
View Post
Announcement
Collapse
No announcement yet.
AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR
Collapse
X
-
Thanks
Comment
-
I think my pre order 1700X bought in 2 april 2017 make segfaults only with CPU core 5 ? script continued without error for the moment ...
Edit: I spoke too fast ... krkrkr
Code:[août12 12:35] logitech-hidpp-device 0003:046D:400A.0007: HID++ 2.0 device connected. [août12 13:38] bash[27367]: segfault at 7f814d3857e8 ip 00007f814d0a1330 sp 00007ffeb781b898 error 4 in libc-2.24.so[7f814cf78000+193000] [août12 15:44] bash[10807]: segfault at 7fba26df87e8 ip 00007fba26b14330 sp 00007ffd63026c28 error 4 in libc-2.24.so[7fba269eb000+193000] [août12 15:45] bash[30145]: segfault at 12 ip 0000000000435d7e sp 00007ffcdc40fc30 error 6 in bash[400000+100000]
Code:[loop-0] Sat Aug 12 12:48:36 CEST 2017 start 0 [loop-1] Sat Aug 12 12:48:37 CEST 2017 start 0 [loop-2] Sat Aug 12 12:48:38 CEST 2017 start 0 [loop-3] Sat Aug 12 12:48:39 CEST 2017 start 0 [loop-4] Sat Aug 12 12:48:40 CEST 2017 start 0 [loop-5] Sat Aug 12 12:48:41 CEST 2017 start 0 [loop-6] Sat Aug 12 12:48:42 CEST 2017 start 0 [loop-7] Sat Aug 12 12:48:43 CEST 2017 start 0 [loop-8] Sat Aug 12 12:48:44 CEST 2017 start 0 [loop-9] Sat Aug 12 12:48:45 CEST 2017 start 0 [loop-10] Sat Aug 12 12:48:46 CEST 2017 start 0 [loop-11] Sat Aug 12 12:48:47 CEST 2017 start 0 [loop-12] Sat Aug 12 12:48:48 CEST 2017 start 0 [loop-13] Sat Aug 12 12:48:49 CEST 2017 start 0 [loop-14] Sat Aug 12 12:48:50 CEST 2017 start 0 [loop-15] Sat Aug 12 12:48:51 CEST 2017 start 0 [loop-2] Sat Aug 12 13:17:40 CEST 2017 build failed [loop-2] TIME TO FAIL: 1744 s [loop-4] Sat Aug 12 13:38:28 CEST 2017 build failed [loop-4] TIME TO FAIL: 2992 s [loop-10] Sat Aug 12 15:44:59 CEST 2017 build failed [loop-10] TIME TO FAIL: 10583 s [loop-1] Sat Aug 12 15:45:08 CEST 2017 build failed [loop-1] TIME TO FAIL: 10592 s
Last edited by scorpio810; 12 August 2017, 09:50 AM.
Comment
-
Originally posted by drSeehas View PostThis is not a fix. It is a workaround. Only AMD can fix this bug. But AMD says it is a linux only bug ...
Ryzen needs a full guard page at the top rather than just a guard region, and so the BSD devs have updated their code accordingly. Linux and Windows "got lucky" in this case because the guard page added for errata in previous CPUs also worked for Ryzen.
We are checking to make sure that the combination of address space randomization and transparent huge page migration will never replace a collection of 4K pages at the top of user memory with a single 2M page (which would effectively remove the guard page). We don't think it will happen because of the unused guard page and the OS-managed write protection of the vsyscall page but need to be sure.Last edited by bridgman; 12 August 2017, 02:25 PM.Test signature
- Likes 2
Comment
-
Originally posted by bridgman View Post... Ryzen needs a full guard page at the top rather than just a guard region, ...
... and so the BSD devs have updated their code accordingly. Linux and Windows "got lucky" in this case because the guard page added for errata in previous CPUs also worked for Ryzen. ...
- Likes 1
Comment
-
Originally posted by drSeehas View PostWas this ever documented/communicated and when?
Originally posted by drSeehas View PostSo there are two "bugs", but in Linux and Windows shows up only one bug?
I do not expect this specific one to cause problems on Linux or Windows but as I said we are checking to make sure that transparent huge page logic (THP) in Linux does not need an additional tweak.Last edited by bridgman; 12 August 2017, 03:37 PM.Test signature
- Likes 1
Comment
-
Originally posted by Khudsa View Post
Same for me, when I disable the opcache option it works without errors. ...
Comment
-
Originally posted by kaseki View Post
I can confirm that disabling opcache eliminates segfault reports generated by kill-ryzen when opcache is set at Auto (at least for the period of 2 hours before I stopped testing). Impact of the opcache disable varies. Unigine Valley and Superposition scores and average frames per second are effectively unchanged. Latency and read rate per Intel MLC are effectively unchanged. Blender rendering, however, does take a few percent longer for the Ryzen logo and the Blender home page 'Classroom' renders. My measurements are not statistically significant, however, so YMMV.
Comment
-
Originally posted by Khudsa View Post
Same for me, when I disable the opcache option it works without errors. I have never do a RMA. How is the process? First contact with AMD (already contacted, waiting answer) and then contact with the store (pccomponentes official store in Spain for Ryzen release) with the AMD's reply?
Thanks
Comment
-
Originally posted by bridgman View Post
There is an errata document but I'm not sure if it has been published yet. I asked about status last week.
IIRC a typical modern CPU has somewhere between 10 and 100 errata (check the revision guides for any recent CPU).
I do not expect this specific one to cause problems on Linux or Windows but as I said we are checking to make sure that transparent huge page logic (THP) in Linux does not need an additional tweak.
Sounds to me that there's no Microcode Fix (if ThreadRipper has it, pretty sure they Ryzen would have it by now as well), From a die perspective - there's no new stepping (TR is supposedly using the same me B1 Zeppelin Dies used in Ryzen). Sounds like a lot of voodoo BS coming out of AMD - how are they able to claim that TR is unaffected if Linux is still being looked at?
Will the announced Ryzen Pro desktop processors have the same level of QA-binning as TR? Will the consumer line as well? Or will we play ongoing SEGV Silicon Lottery going forward on the consumer line?
AMD needs to give us a better explanation and fast-track RMA process for people affected instead of making us take pictures of our BIOS settings, our Case - (a bit pissed as I have two systems affected, and AMD support takes several days to respond to an ongoing support conversation).
A fast-track path IMO would be, make us set BIOS to defaults (to rule out overclocking), provide us a testing tool (something we can execute) and if it fails, generate an RMA number. The way it's happening now, it'll take over a week before one gets an RMA number because they want pictures of the BIOS, pictures of the case and AMD support doesn't respond quickly enough. Add shipping into account, this process ends up taking a couple of weeks (crossing ones fingers the first CPU sent back will be a good copy).Last edited by Funks; 13 August 2017, 03:30 AM.
- Likes 2
Comment
Comment