Announcement

Collapse
No announcement yet.

Linux 5.2+ Hit By AVX Register Corruption Bug - Affecting At Least Golang Programs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 5.2+ Hit By AVX Register Corruption Bug - Affecting At Least Golang Programs

    Phoronix: Linux 5.2+ Hit By AVX Register Corruption Bug - Affecting At Least Golang Programs

    The Linux 5.2 kernel and newer appears to be suffering from an AVX register corruption bug stemming from signal delivery. This register corruption issue is manifesting itself at least for Golang programs leading to a variety of bug reports when running on Linux 5.2 through at least the newly-minted Linux 5.4...

    http://www.phoronix.com/scan.php?pag...gister-Corrupt

  • down1
    replied
    I hope this fixes the weird system hangs I've been having.

    Leave a comment:


  • jabl
    replied
    Originally posted by carewolf View Post
    They ended up making that clearer in the next C standard, but there are still cases where it can't be fixed (when accessing types below the lowest read/write resolution of a specific architecture).
    Due to this, in practice, as of C11 byte addressing is a required architectural feature (as long as the C compiler claims support for threads/atomics and thus the concurrency memory model, via lack of __STDC_NO_THREADS__ and __STDC_NO_ATOMICS__ macros, that is). The alternative of faking it via RMW atomics or an LL/SC loop is too slow to be usable in practice.

    Even outside concurrency, the main reason why the original DEC Alpha sucked at string processing was lack of byte addressing, necessitating a lot of RMW code for otherwise simple operations. They fixed it in later iterations though.

    Leave a comment:


  • knweiss
    replied
    The bug is fixed: https://bugzilla.kernel.org/show_bug.cgi?id=205663#c2

    BTW: The long version of this debugging story can be found in this Go issue: https://github.com/golang/go/issues/35326
    Last edited by knweiss; 11-27-2019, 03:03 PM. Reason: Link the Go issue too

    Leave a comment:


  • discordian
    replied
    debian testing provided the Linux 5.3 Kernel yesterday, compiled with gcc9 (peviously gcc8 was used). I am back to 5.2 after the graphics seems to be horribly molested after going into standby, and me being unable to log back in.
    Of course this can be a caused by a change in amdgpu (or elsewhere), or we are in for some more pesky bugs like this showing up in the future...

    Leave a comment:


  • Weasel
    replied
    Just write the thread local variable accessor in inline asm and problem solved? How hard can it be?

    Leave a comment:


  • carewolf
    replied
    Originally posted by HadrienG View Post
    IIRC, kernels handle this problem with compiler flags that disable generation of those specific instructions, as if the target hardware did not have them. That definitely moves us further away from the C standard, but is still well-defined in the eye of the compiler.
    Yeah, it is all compiler flags, architecture defined behavior or explicit agreements with the compiler maker. At least in theory.

    A "funny" case I remember from almost a decade back was the kernel getting a random value on read of a int in one thread. It turned out to be caused by another threading reading a neighbour int with a 64-bit read, then changing his part, then writing the whole thing back overwriting any changes of the other part in another thread. That was all legal for the compiler at the time because the value changed wasn't declared volatile or atomic, but it caused 'random value out of nowhere'.

    They ended up making that clearer in the next C standard, but there are still cases where it can't be fixed (when accessing types below the lowest read/write resolution of a specific architecture). So while that is fixed now, making it explicit when it is defined and when it is architecture dependent. I like to use it as an example of how much we can take "sane" behavior for granted, and that there almost always are implicit things we rely on we never know could be broken until they are.

    Leave a comment:


  • HadrienG
    replied
    Originally posted by carewolf View Post
    True, but there are many types of CPU state that a compiler is technically allowed to change because it is within the calling convension that it may do so, but that a kernel would assume it won't change. For instance not changing any x87 state or SSE registers if the kernel specifically avoids using floating point operations. All kinds of code could break if the compiler started optimizing integer division with fp divisions that trashed some FPU registers the compiler was allowed to change, but the kernel didn't think it would.
    IIRC, kernels handle this problem with compiler flags that disable generation of those specific instructions, as if the target hardware did not have them. That definitely moves us further away from the C standard, but is still well-defined in the eye of the compiler.

    Leave a comment:


  • carewolf
    replied
    Originally posted by HadrienG View Post
    Note that C also provides you with semi-standard tools to defer to system-specific semantics, like assembly and volatile. If you're writing a kernel, using those is definitely more sustainable than relying on undefined behavior being compiled in a certain way.
    True, but there are many types of CPU state that a compiler is technically allowed to change because it is within the calling convension that it may do so, but that a kernel would assume it won't change. For instance not changing any x87 state or SSE registers if the kernel specifically avoids using floating point operations. All kinds of code could break if the compiler started optimizing integer division with fp divisions that trashed some FPU registers the compiler was allowed to change, but the kernel didn't think it would.

    Leave a comment:


  • HadrienG
    replied
    Originally posted by carewolf View Post
    Well, relying on system specific behaviour that lies outside the C/C++ standard is sort of necessary for a kernel, but yes, it still could be a programming mistake.
    Note that C also provides you with semi-standard tools to defer to system-specific semantics, like assembly and volatile. If you're writing a kernel, using those is definitely more sustainable than relying on undefined behavior being compiled in a certain way.

    Leave a comment:

Working...
X