Announcement

**PuckPoltergeist** · 07 August 2017, 04:09 PM

Originally posted by k1e0x View Post

I'm not a programmer but I know that when you compile a compiler it does some funky switching of code in and out of ram to compile itself. That is showing the bug. The bug is actually a problem with memory management or memory prefetch of some sort.

Please read my post in context with the post from Holograph. I'm not speculating what really happens, I was pointing out, that recompiling for Ryzen couldn't be the solution, when such code shows the most problems.

**k1e0x** · 07 August 2017, 04:18 PM

Originally posted by PuckPoltergeist View Post

Please read my post in context with the post from Holograph. I'm not speculating what really happens, I was pointing out, that recompiling for Ryzen couldn't be the solution, when such code shows the most problems.

Yeah,

Someone is going to have to help me here but I think its the process of the execution of the non-executable memory pages using GCC trampolines that it's showing up in. I had thought that the method of even allowing this by OS's was obsoleted because it's kinda a large x86 exploit surface.

**Marc Driftmeyer** · 07 August 2017, 04:23 PM

Let the person waste thousands more on Intel. I don't give a crap. I'm glad the problem is acknowledged and being handled in a professional manner. Furthermore, I expect in the future for Michael to use private communications with AMD now that he clearly has a professional working relationship to maintain.

If not, he deserves zero respect in the industry. You forge alliances and you work out relationships but you only ever get one shot of screaming the house in on fire.

**PuckPoltergeist** · 07 August 2017, 04:30 PM

Originally posted by k1e0x View Post

Someone is going to have to help me here but I think its the process of the execution of the non-executable memory pages using GCC trampolines that it's showing up in.

Is clang/llvm doing the same?

**Veerappan** · 07 August 2017, 04:30 PM

Here's to hoping it's a microcode fix or can be worked around at the kernel level. I'd rather avoid replacing my R7 1700 if possible.

Note: I am one of the users affected here. Compiling mesa with GCC and 'make -j' does segfault on me about one in every ~3 runs (after a make clean). I can usually just re-start the make process and it'll run to completion, but it is annoying and undermines confidence in the built result somewhat.

Edit: And for those who insisted that overclocking was the culprit, I'm running stock clocks on the CPU and default speeds for the RAM as well.

**juanrga** · 07 August 2017, 04:32 PM

Congrats to all people that actively participated on identifiying and communicating this problem. During last months, I have seen lots of deniers attacking to people whose only aim was to post the facts.

**k1e0x** · 07 August 2017, 04:40 PM

Originally posted by PuckPoltergeist View Post

Is clang/llvm doing the same?

Yes it is. Tho only very rarely for certan things. itself and mesa appear to be affected. why I don't exactly know.. maybe you need mesa to build mesa..

Far as I understand nothing (legit) outside of development would use that method of execution. It isn't like any code anywhere would blow up tho and everything you'd run on Ryzen has a chance to crash.. thats not the case.. its very specific and even then intermittent.

**RussianNeuroMancer** · 07 August 2017, 04:56 PM

Originally posted by Veerappan View Post

Here's to hoping it's a microcode fix or can be worked around at the kernel level. I'd rather avoid replacing my R7 1700 if possible.

Yep, hopefully other solution besides RMA will be found, and hopefully it will be rolled out to users as soon as possible.

**chuckula** · 07 August 2017, 05:02 PM

You got picked up on the techreport: http://techreport.com/news/32362/amd...oblem-on-ryzen

**ermo** · 07 August 2017, 05:06 PM

Originally posted by rstrube View Post

I just discovered that I can easily reproduce a hard crash by running:

PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 ./phoronix-test-suite stress-run build-linux-kernel build-apache build-imagemagick

As Michael suggested in his prior articles.

I'm running a Ryzen 1700 that's overclocked from 3.0 to 3.7 GHz on stock voltage. I didn't have any problems running prime95 (or at least the linux equivalent) overnight, so this has caught me a bit off guard. Currently my system just hard locks in about 1 minute of running the aforementioned command. Disabling SMT so far has appeared to solve the problem, but it might just occur much later. I must admit I'm a bit disappointed by this issue and I hope that AMD can release some sort of fix that does not require an RMA.

Out of curiosity: Have you been able to reproduce the crash when running at stock settings?

Being able to reproduce the crash at stock settings is probably a better canary for whether you are affected by a real bug, or if your system is simply configured such that it is only marginally stable (no judgement implied)?

P.S. I run an OCed i7-3770k. It is stable at a higher frequency + lower voltage when stress testing the floating point / SIMD units (using intel burntest), but needs higher voltage + lower frequency for stability when I'm actually using it for daily tasks.

Announcement

AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment