Announcement

**jabl** · 27 November 2019, 03:22 AM

Originally posted by HadrienG View Post

Not sure if it's a compiler bug, from a look at the code it could also be a missing compiler optimization barrier in the kernel. Cohabitation of optimizing compilers that cache the contents of memory in registers, with low-level constructs like context switches that randomly change the contents of memory without warning the compiler about it, is fundamentally hard...

Indeed, from the bug report it seems that GCC 9 just optimizes better, managing to keep a variable in a register instead of reloading it from memory, which causes the bug. Seems like a bug in the kernel, missing compiler barrier like you write, or lack of READ_ONCE() macro use to force reloading the variable, or such.

**mlau** · 27 November 2019, 03:53 AM

I ran the reproducer a few times on 5.3.13 built with gcc-9.2.1, but it didn't trigger anything. Maybe it's already fixed in gcc or the latest -stable kernels, or requires a special kernel config option I didn't set.
EDIT: it requires CONFIG_RETPOLINE=y to trigger.

**pomac** · 27 November 2019, 04:48 AM

With:
gcc (Gentoo 9.2.0-r2 p3) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

On 5.4.0

---
./a.out
input = bb cb 00 00 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
output = 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
mismatch
child process failed
---

**FireBurn** · 27 November 2019, 06:18 AM

Originally posted by pomac View Post

With:
gcc (Gentoo 9.2.0-r2 p3) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

On 5.4.0

---
./a.out
input = bb cb 00 00 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
output = 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
mismatch
child process failed
---

Bugger

**carewolf** · 27 November 2019, 08:07 AM

Originally posted by KrissN View Post

Not necessarily. The existing code may have been relying on undefined behaviour. The compiler may have the full right to optimize this and just by the fact that GCC 8 did not do it, doesn't mean that GCC 9 is buggy.

Compilers are under constant pressure of improving their optimizations and the consequence is that they more strictly follow the C/C++ standard as to what they are allowed to optimize and what not. Buggy code that assumes the compiler will compile it in a certain way, while the standard doesn't give that guarantee often becomes a victim of newer compilers.

Well, relying on system specific behaviour that lies outside the C/C++ standard is sort of necessary for a kernel, but yes, it still could be a programming mistake.

**HadrienG** · 27 November 2019, 08:22 AM

Originally posted by carewolf View Post

Well, relying on system specific behaviour that lies outside the C/C++ standard is sort of necessary for a kernel, but yes, it still could be a programming mistake.

Note that C also provides you with semi-standard tools to defer to system-specific semantics, like assembly and volatile. If you're writing a kernel, using those is definitely more sustainable than relying on undefined behavior being compiled in a certain way.

**carewolf** · 27 November 2019, 09:29 AM

Originally posted by HadrienG View Post

Note that C also provides you with semi-standard tools to defer to system-specific semantics, like assembly and volatile. If you're writing a kernel, using those is definitely more sustainable than relying on undefined behavior being compiled in a certain way.

True, but there are many types of CPU state that a compiler is technically allowed to change because it is within the calling convension that it may do so, but that a kernel would assume it won't change. For instance not changing any x87 state or SSE registers if the kernel specifically avoids using floating point operations. All kinds of code could break if the compiler started optimizing integer division with fp divisions that trashed some FPU registers the compiler was allowed to change, but the kernel didn't think it would.

**HadrienG** · 27 November 2019, 09:41 AM

Originally posted by carewolf View Post

True, but there are many types of CPU state that a compiler is technically allowed to change because it is within the calling convension that it may do so, but that a kernel would assume it won't change. For instance not changing any x87 state or SSE registers if the kernel specifically avoids using floating point operations. All kinds of code could break if the compiler started optimizing integer division with fp divisions that trashed some FPU registers the compiler was allowed to change, but the kernel didn't think it would.

IIRC, kernels handle this problem with compiler flags that disable generation of those specific instructions, as if the target hardware did not have them. That definitely moves us further away from the C standard, but is still well-defined in the eye of the compiler.

**carewolf** · 27 November 2019, 10:35 AM

Originally posted by HadrienG View Post

IIRC, kernels handle this problem with compiler flags that disable generation of those specific instructions, as if the target hardware did not have them. That definitely moves us further away from the C standard, but is still well-defined in the eye of the compiler.

Yeah, it is all compiler flags, architecture defined behavior or explicit agreements with the compiler maker. At least in theory.

A "funny" case I remember from almost a decade back was the kernel getting a random value on read of a int in one thread. It turned out to be caused by another threading reading a neighbour int with a 64-bit read, then changing his part, then writing the whole thing back overwriting any changes of the other part in another thread. That was all legal for the compiler at the time because the value changed wasn't declared volatile or atomic, but it caused 'random value out of nowhere'.

They ended up making that clearer in the next C standard, but there are still cases where it can't be fixed (when accessing types below the lowest read/write resolution of a specific architecture). So while that is fixed now, making it explicit when it is defined and when it is architecture dependent. I like to use it as an example of how much we can take "sane" behavior for granted, and that there almost always are implicit things we rely on we never know could be broken until they are.

**Weasel** · 27 November 2019, 12:14 PM

Just write the thread local variable accessor in inline asm and problem solved? How hard can it be?

Announcement

Linux 5.2+ Hit By AVX Register Corruption Bug - Affecting At Least Golang Programs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment