Announcement

**Hugh** · 14 November 2019, 05:06 PM

Originally posted by blackshard View Post

Nope, technically this is part of the workaround. If I understand correctly this phrase:

if you align the jump instructions to avoid 32-byte boundaries (which is the thing the GCC patch does), you don't get the issue.
So this GCC patch is aligning the jump instructions which crosses 32-byte boundaries to the next 32-byte boundary to avoid both the JCC erratum and the performance penalty due to the microcode update.

Example: if you compile your software with a patched GCC and you run on a defective processor (ie: without the updated microcode) you don't get the "unpredictable behaviour" and you don't get better performance. How this could be an optimization?

No, this is not "part of the workaround". There is a fix. It is the microcode update. It is supposedly a complete fix, in and of itself.

The assembler change is meant to reduce a performance impact from the microcode change. It has no effect on correctness. It is an optimization.

This transformation is clearly optional. The microcode fix has been rushed out to Linux distros. The transformation has not. And even if the GAS change is shipped, recompiled packages have not been (except, I take it, Clear Linux). And much code never goes through GAS: JIT stuff (JavaScript, Java, various graphics pipeline things, QEMU, ...), LLVM stuff (go, rust, swift, clang, ...).

There is a way you could view the code transformation as a fix, albeit an impractical one. If you don't update the microcode, but you did perform this transformation to all the code on the system (very very hard to do), that would ensure that you never hit this bug. At least that's what I infer -- Intel's disclosure isn't sufficient to be sure.

**archsway** · 14 November 2019, 06:49 PM

The only reason this only has a 0.3% performance loss is because modern CPUs are so good at executing NOP instructions.

I wonder what the impact would be like if modern CPUs were like some of the 40 year CPUs such as the z80 on which executing a nop was the same speed as a 8-bit register logic op (e.g. add).

**SystemCrasher** · 15 November 2019, 12:56 AM

Originally posted by tildearrow View Post

Proof that AMD has already beaten Intel.

AMD always did more engineering and less marketing. Actually that was reason why I sticked to AMD: with AMD you usually get more than advertised, so surprises are pleasant. With Intel you get loud marketing that greatly exaggerated actual HW. So all surprises are unpleasant. You learn Intel computed TDP the way you would never use your system, put some nasty crap here and there, cheated in marketing materials, cut some corners on engineering... and at the end of day you get bugged backdoored crap. Far worse than advertised. It's shame AMD learned that folly from Intel and started to backdoor their HW as well, as doing debatable marketing. So uh, well, guess I wouldn't buy new AMD HW anymore. Because I don't get how they're better than intel at this point of space and time.

**log0** · 15 November 2019, 06:56 AM

Originally posted by archsway View Post

The only reason this only has a 0.3% performance loss is because modern CPUs are so good at executing NOP instructions.

I wonder what the impact would be like if modern CPUs were like some of the 40 year CPUs such as the z80 on which executing a nop was the same speed as a 8-bit register logic op (e.g. add).

The nop in this case is not executed. It is a branch prediction prefix used by Intel Netburst arch only (P4, Pentium D) afaik.

**pyler** · 15 November 2019, 10:05 AM

You can clearly see that compiler patch will affect generic codegen. :/ RIP perf.

⚙ D70157 Align branches within 32-Byte boundary(NOP padding)

https://reviews.llvm.org/D70157#1747428

**fintux** · 15 November 2019, 04:40 PM

Originally posted by Hugh View Post

1.003 translates to 0.3%

Oops, you're right. Then I'm not sure if the value is a "dummy" value that doesn't take into account whether the benchmark gets a smaller number or a bigger number. If it takes that into account, then the difference is margin-of-error class. If not, then I'm not sure...

**coder** · 16 November 2019, 01:25 AM

Originally posted by phoronix View Post

Phoronix: Intel's Assembler Changes For JCC Erratum Are Not Hurting AMD

there was some concern expressed by readers that it might hurt AMD performance. That does not appear to be the case...

Please try kernel compilation and any text stream (or XML) processing benchmarks you might have.

**coder** · 16 November 2019, 01:35 AM

Originally posted by archsway View Post

The only reason this only has a 0.3% performance loss is because modern CPUs are so good at executing NOP instructions.

I wouldn't say "the only reason". Michael's chosen set of benchmarks included some that were probably I/O-bound, and others which probably had low branch-density.

**archsway** · 16 November 2019, 02:12 AM

Originally posted by coder View Post

I wouldn't say "the only reason". Michael's chosen set of benchmarks included some that were probably I/O-bound, and others which probably had low branch-density.

Do you think I actually bothered reading the article to see what benchmarks were used?

Announcement

Intel's Assembler Changes For JCC Erratum Are Not Hurting AMD

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment