Originally posted by HadrienG
View Post
Announcement
Collapse
No announcement yet.
Linux 5.2+ Hit By AVX Register Corruption Bug - Affecting At Least Golang Programs
Collapse
X
-
- Likes 1
-
I ran the reproducer a few times on 5.3.13 built with gcc-9.2.1, but it didn't trigger anything. Maybe it's already fixed in gcc or the latest -stable kernels, or requires a special kernel config option I didn't set.
EDIT: it requires CONFIG_RETPOLINE=y to trigger.Last edited by mlau; 28 November 2019, 03:27 AM.
Comment
-
With:
gcc (Gentoo 9.2.0-r2 p3) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
On 5.4.0
---
./a.out
input = bb cb 00 00 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
output = 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
mismatch
child process failed
---
Comment
-
Originally posted by pomac View PostWith:
gcc (Gentoo 9.2.0-r2 p3) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
On 5.4.0
---
./a.out
input = bb cb 00 00 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
output = 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
mismatch
child process failed
---
Comment
-
Originally posted by KrissN View Post
Not necessarily. The existing code may have been relying on undefined behaviour. The compiler may have the full right to optimize this and just by the fact that GCC 8 did not do it, doesn't mean that GCC 9 is buggy.
Compilers are under constant pressure of improving their optimizations and the consequence is that they more strictly follow the C/C++ standard as to what they are allowed to optimize and what not. Buggy code that assumes the compiler will compile it in a certain way, while the standard doesn't give that guarantee often becomes a victim of newer compilers.
Comment
-
Originally posted by carewolf View PostWell, relying on system specific behaviour that lies outside the C/C++ standard is sort of necessary for a kernel, but yes, it still could be a programming mistake.
- Likes 1
Comment
-
Originally posted by HadrienG View PostNote that C also provides you with semi-standard tools to defer to system-specific semantics, like assembly and volatile. If you're writing a kernel, using those is definitely more sustainable than relying on undefined behavior being compiled in a certain way.
- Likes 1
Comment
-
Originally posted by carewolf View PostTrue, but there are many types of CPU state that a compiler is technically allowed to change because it is within the calling convension that it may do so, but that a kernel would assume it won't change. For instance not changing any x87 state or SSE registers if the kernel specifically avoids using floating point operations. All kinds of code could break if the compiler started optimizing integer division with fp divisions that trashed some FPU registers the compiler was allowed to change, but the kernel didn't think it would.
Comment
-
Originally posted by HadrienG View PostIIRC, kernels handle this problem with compiler flags that disable generation of those specific instructions, as if the target hardware did not have them. That definitely moves us further away from the C standard, but is still well-defined in the eye of the compiler.
A "funny" case I remember from almost a decade back was the kernel getting a random value on read of a int in one thread. It turned out to be caused by another threading reading a neighbour int with a 64-bit read, then changing his part, then writing the whole thing back overwriting any changes of the other part in another thread. That was all legal for the compiler at the time because the value changed wasn't declared volatile or atomic, but it caused 'random value out of nowhere'.
They ended up making that clearer in the next C standard, but there are still cases where it can't be fixed (when accessing types below the lowest read/write resolution of a specific architecture). So while that is fixed now, making it explicit when it is defined and when it is architecture dependent. I like to use it as an example of how much we can take "sane" behavior for granted, and that there almost always are implicit things we rely on we never know could be broken until they are.
- Likes 1
Comment
Comment