Announcement

Collapse
No announcement yet.

Linus Torvalds' Latest Commentary Against -O3'ing The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by coder View Post
    If it has local storage and a bounded scope that cannot be visible to another thread or device, then where's the hazard?
    Agreed, but that's a very large "if", and it's simply not something the compiler *does* reliably infer, regardless of whether it theoretically *can* or not. To take a trivial example, assume you have a device/lib/whatever that provides random numbers several billion times per second:
    Code:
    static volatile int x;
    int a = 42;
    init_hw( &x );
    for (int i = 0; i < 10; ++i) {
      a *= x;
    }
    release_hw( &x );
    return n;
    The compiler only has two ways to not screw that up: honor the volatile correctly, or understand that since we took the address of x it needs to leave it TF alone. You're arguing that it "intelligently" going with Option B "should" work - and I do agree it "should" - but the reality, time and time again, to the point of "more often than not" every time GCC's notes say "rewrote a bunch of the optimizer", is that it doesn't.
    Writing a decent optimizing compiler is hard enough as it is, but it becomes infinitely harder to do so correctly when the compiler (a) ignores the directives that it's given because this month's rewrite of the optimization pass failed to consider them properly, and (b) the constant churn *of* that optimization pass makes the reintroduction of such bugs a seemingly frequent event.

    C is a pretty stable language, and even C++ "mostly" is. Nearly all compiler work is nothing *but* rewriting the O pass, and it's fairly evident from GCC's changelogs and errata that well over half the bugs in any given release are directly related to that rather than the basic codegen. Honestly, I think it's more realistic to be impressed that even "just" -O2 generally works much much more often than not - but ultimately you're playing a game of probabilities. To bring this slightly back on track, -O3 is really just saying you're okay with going from e.g. a 99.9% chance of the compiler not messing up to a 99% one. As with gambling, viruses, contraception, and everything else though, the more you keep rolling the dice the more likely it is that at some point your luck will run out.

    When a given project is not just multi-platform, but also spans multiple GNU/Linux stacks because it's embedded across a whole bunch of different SKUs, you will invariably end up having to build with what is really at least half a dozen different compilers. Every single one of them needs to "work", and IME for any nontrivial project the chances of at least *one* of them having a bug in the optimization pass is very close to 1. (IME it's actually exactly 1, but maybe I'm just cursed. :P). You can have 15 dependencies plus all of your own code that all works on 6 platforms, but the 7th one is an ARM box with no fp support, and the compiler there generates bad code for a specific function in one of the libs you're using if it's built at the library author's default -O2.
    Again, this is "things that actually happened" rather than theoretical scenarios. Validating the toolchain for that device has been the work of at best a different department and often a different company entirely (generally in China, to add language barrier and timezone headaches into the mix) for several months, and changing it is simply not possible.

    > I'm not arguing what the compiler should do - just trying to imagine a case where it could safely ignore volatile.

    I understand *why* you're thinking down that road, because yes, it's a fun puzzle in the abstract - but the answer, *really*, honestly, truly, is "there aren't any". Just, never.
    If you go back to simpler times, say, the DOS era, long before ASLR etc, you (that is, "the developer") could legitimately know that a program always loads at address x, and its DATA segment is always at address y, and you would be justified in (ab :P)using that knowledge to have e.g. a parent process reach in and modify some region of the child (say, the first static in the unit containing main()) as an IPC mechanism. No matter what kind of batshit-crazy scenario you can imagine, someone will absolutely have done it somewhere at some point.

    > That's weird.

    Yes, but also no. Ultimately, it's "just" a "simple" bug: something that "shouldn't have" happened, but did. The guy responsible for the code that broke it, who acked the bug (and did eventually fix it, two point releases later) couldn't provide any way to work around it other than the -O0 I'd already changed it to use - because the whole point of the bug was that the compiler thought it was "helping", when it shouldn't have been trying to.

    (Sidenote: Remember that the code is from memory and is an oversimplification of the actual bug that is unquestionably not the exact construct that caused the compiler to trip over its own feet. It is, however, "close enough" to emphasize the real point, which is that the compiler's failure to honor the volatile directive in the first place is what then gave the compiler the *opportunity* to screw up).

    > I wrote some moderate amount of C++ with inline asm in GCC 4.3 or thereabouts. Mostly MMX and SSE stuff, when I wasn't satisfied with the code generated by the intrinsics. They were all inside inline functions and compiled with -O3. They're included in regression tests which we've run multiple times per upgrade, even to this very day (currently on GCC 10.2). I don't recall ever having any problems like yours.

    Sure. This was *far* from the only asm in the project, and all of the rest of it worked fine. And all of it, including the piece that GCC broke, had worked fine for years before then, and I'm sure would have continued to work for years afterwards once the compiler was fixed. Some bugs are like that, and I think that's especially true for a compiler: c.f. the piece about probabilities earlier on.

    > A bug is a bug. But, where the compiler can deliver the same behavior by doing something equivalent to the spec... that's what people actually want. I get that volatile is supposed to be no-go, but otherwise an optimizer's life is spent trying to do something equivalent to what you said.

    Absolutely. Instruction reordering happens all the time, nearly always works, and a substantial fraction of developers have no idea it even exists. The thing is, while the compiler is absolutely free to disregard "register" etc (to the point where most modern compilers simply discard the keyword entirely), going in the opposite direction is absolutely forbidden, and the rather crucial "but otherwise" part of your sentence simply doesn't apply. The italicized parts are, by definition, not permitted.

    Comment


    • Originally posted by coder View Post
      I frequently debug using -O2 as the baseline image.
      Sure, that's fine if what you're looking for is bugs in the *source* code, i.e. logic errors etc. But it's awkward AF when what you're looking for is bugs in the *generated* code, where the relationship between the source and the asm can be significantly less direct that you would like.

      Comment


      • Originally posted by DavidBrown View Post
        That would all be perfectly allowed, but not what the programmer had intended.
        A fair point to make, but not the case here. The bug was acked, tracked, and eventually fixed and documented in the point release notes. (Just, too late for us to not have to hack the code in stupid ways on that platform to be able to ship).

        My point wasn't "this exact code will cause GCC x.y to screw up", it was just "compiler bugs not only exist, but are quite common, and become more so as the O level increases". There are several hundred errata per major release of GCC, with the majority of them specifically being optimizer bugs, and people who imagine that that isn't the case are being extremely naive about it.
        Last edited by arQon; 28 June 2022, 07:24 PM.

        Comment


        • The point of volatile is to tell the compiler to generate straightforward memory accesses instead of trying to optimize them or move them somewhere else.

          Maybe it would make sense for a static analyzer to issues warnings in cases where there does not seem to be a point in using volatile, yet even such a warning should be optional. Besides, the volatile designation could just be part of a commonly used macro where the "volatile" part does not have make sense in each use case.

          However for a compiler to try to apply optimizations anyway just seems worse than wasted time, and would make me question that the compiler knows what it is doing.

          Comment


          • Originally posted by arQon View Post
            My point wasn't "this exact code will cause GCC x.y to screw up", it was just "compiler bugs not only exist, but are quite common, and become more so as the O level increases". There are several hundred errata per major release of GCC, with the majority of them specifically being optimizer bugs, and people who imagine that that isn't the case are being extremely naive about it.
            My impression is that -O3 is starting to get used a lot, after so many years of existence, and I see that as a good thing. I don't know if maybe GCC should go slower in releasing new rewrites of the optimizer, and faster in fixing bugs. Or if it needs to add more automatic test suites. But that would be much better than not being able to rely on -O3 working.

            -O3 is meant to enable optimizations that are either expensive in terms of compilation time, or may increase code size. So I think for large software packages like the kernel, the optimum would be that core functions critical for performance are compiled in -O3, while the bulk is compiled in -O2 when that actually reduces code size. Roughly speaking. That is, at least for software where performance optimizations in source code are generally considered to be worth the effort, which surely includes the kernel.
            Last edited by indepe; 29 June 2022, 03:09 AM.

            Comment


            • Originally posted by arQon View Post

              Agreed, but that's a very large "if", and it's simply not something the compiler *does* reliably infer, regardless of whether it theoretically *can* or not. To take a trivial example, assume you have a device/lib/whatever that provides random numbers several billion times per second:
              Code:
              static volatile int x;
              int a = 42;
              init_hw( &x );
              for (int i = 0; i < 10; ++i) {
              a *= x;
              }
              release_hw( &x );
              return n;
              The compiler only has two ways to not screw that up: honor the volatile correctly, or understand that since we took the address of x it needs to leave it TF alone. You're arguing that it "intelligently" going with Option B "should" work - and I do agree it "should" - but the reality, time and time again, to the point of "more often than not" every time GCC's notes say "rewrote a bunch of the optimizer", is that it doesn't.
              The compiler doesn't get to choose - honouring "volatile" is not an option. Even when it can figure out that a particular volatile variable is only handled locally and never escapes, it cannot change the volatile accesses (but it might learn some other information for other optimisations).

              I'd like to see actual code for the bug you are talking about, as well as details about the compiler version. If you don't know it, I can strongly recommend the site https://godbolt.org/ for the purpose. It is an online compiler. You can enter your code (in many languages, not just C), choose your compiler (there are dozens of versions of gcc, for many different targets, along with clang, msvc, sdcc, and others), and choose your options. It shows the generated assembly code. Once you have a combination here that demonstrates the issue, please use the "share" button on the site and post the link here so that we can go directly to the code in question.

              And if you have a GCC bugzilla report for the issue, that would also be very interesting.

              Writing a decent optimizing compiler is hard enough as it is, but it becomes infinitely harder to do so correctly when the compiler (a) ignores the directives that it's given because this month's rewrite of the optimization pass failed to consider them properly, and (b) the constant churn *of* that optimization pass makes the reintroduction of such bugs a seemingly frequent event.
              The compiler (and its writers) do not ignore "volatile", nor is there a constant churn of optimisation passes, nor is there regular reintroduction of any past bug. Bugs do happen - no developer or development process is perfect, and a project like GCC is big and complex. But they are rare, especially incorrect code generation bugs. They are extremely rare on common constructs, but more likely on unusual code or less commonly used targets or flag combinations. And once a serious bug has been identified and fixed, it is usually added to the regression tests so that reintroductions get spotted.

              C is a pretty stable language, and even C++ "mostly" is. Nearly all compiler work is nothing *but* rewriting the O pass, and it's fairly evident from GCC's changelogs and errata that well over half the bugs in any given release are directly related to that rather than the basic codegen. Honestly, I think it's more realistic to be impressed that even "just" -O2 generally works much much more often than not - but ultimately you're playing a game of probabilities. To bring this slightly back on track, -O3 is really just saying you're okay with going from e.g. a 99.9% chance of the compiler not messing up to a 99% one. As with gambling, viruses, contraception, and everything else though, the more you keep rolling the dice the more likely it is that at some point your luck will run out.
              Probabilities are meaningless without a scale. Are you talking about per programmer, per program, per line of code? The reality is that perhaps 99.9% of programmers will never encounter an incorrect code compiler bug in all their working time, if they are using quality mainstream compilers like GCC for x86, MSVC, etc. However, 99.9% of programmers will encounter unexpected code due to errors in their own source.

              If you work with a range of less used targets, unusual flags, low-level code, complex and rare code patterns, etc., then your chances of hitting a bug go up significantly - though they are still low overall.

              In my 25 years or so experience with gcc on at least a dozen different targets, I have only come across perhaps two or three incorrect code generation issues. And I am unusually interested in compilers, and also help people in various mailing lists and other forums. (I've found several bugs in lesser compilers.) One that could be relevant here is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

              So I am not at all suggesting that gcc is perfect or bug-free, merely that it is not remotely as risky as you would have it. And it is nothing like as systematic or endemic as you suggest - the compiler honours "volatile" in all cases, and the developers do not spend their time messing around with optimisation passes and releasing untested compilers. Compiler bugs occur due to mistakes, not because the developers decide to change what the language means!

              > I wrote some moderate amount of C++ with inline asm in GCC 4.3 or thereabouts. Mostly MMX and SSE stuff, when I wasn't satisfied with the code generated by the intrinsics. They were all inside inline functions and compiled with -O3. They're included in regression tests which we've run multiple times per upgrade, even to this very day (currently on GCC 10.2). I don't recall ever having any problems like yours.

              Sure. This was *far* from the only asm in the project, and all of the rest of it worked fine. And all of it, including the piece that GCC broke, had worked fine for years before then, and I'm sure would have continued to work for years afterwards once the compiler was fixed. Some bugs are like that, and I think that's especially true for a compiler: c.f. the piece about probabilities earlier on.

              > A bug is a bug. But, where the compiler can deliver the same behavior by doing something equivalent to the spec... that's what people actually want. I get that volatile is supposed to be no-go, but otherwise an optimizer's life is spent trying to do something equivalent to what you said.

              Absolutely. Instruction reordering happens all the time, nearly always works, and a substantial fraction of developers have no idea it even exists. The thing is, while the compiler is absolutely free to disregard "register" etc (to the point where most modern compilers simply discard the keyword entirely), going in the opposite direction is absolutely forbidden, and the rather crucial "but otherwise" part of your sentence simply doesn't apply. The italicized parts are, by definition, not permitted.
              Compilers are not free to disregard "register" - but there is no requirement to treat it as a code optimisation hint. GCC uses it for optimisation on -O0, and ignores the optimisation hint for -O1 and above. But it still has to check it in terms of the language (you are not allowed to take the address of a "register" variable, for example). Compilers are not free to disregard "volatile", though they are free to define "what it means to be a volatile access" and they are free to define how their own extensions (like inline assembly) work. GCC defines these in the way you would expect and the way it is documented in the manual, baring the very occasional compiler bug.

              Comment


              • Originally posted by DavidBrown View Post
                Compilers are not free to disregard "volatile", though they are free to define "what it means to be a volatile access" and they are free to define how their own extensions (like inline assembly) work. GCC defines these in the way you would expect and the way it is documented in the manual, baring the very occasional compiler bug.
                What possible variation do you see in "what it means to be a volatile access" ?

                Is the following meaning not generally applicable?



                Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.

                Comment


                • Originally posted by indepe View Post

                  What possible variation do you see in "what it means to be a volatile access" ?

                  Is the following meaning not generally applicable?


                  Some volatile accesses are simple:
                  Code:
                      volatile int v;
                      int x;
                  
                      v = x;      // Simple write
                      x = v;      // Simple read
                  There is little to wonder about there, and compilers all do what you might expect (unless you expect "volatile" to mean more than it does - some people mistakenly think it means "indivisible", which would not be true on a processor which cannot access an "int" in one instruction, or that it forces an order on non-volatile accesses, or that it has semantics like C11 atomics).

                  But consider :
                  Code:
                      v++;
                      v *= 2;
                      v += v;
                      v = v = 1;
                  Here it is much harder to define. Should the compiler generate operations that use the memory target directly, if the processor has such addressing modes? Should it do a read, then a write, even if could use a single instruction?

                  What about volatile bitfields? Should a write be of the smallest possible size (typically 8 bit), or the size for the type given in the bitfield, or the most efficient access size (which might be 32 bit) ?

                  Compilers vary hugely in how they treat these, and how consistent they are in such expressions. It is with good reason that C++ now deprecates them, and only recommends the simple accesses.

                  And what about:
                  Code:
                      *(volatile int*)(&x) = 1;
                  Is that a volatile access, even though "x" was not declared volatile? Until C18, the standard said nothing at all about the meaning of such expressions. With C18, it is now considered a volatile access according to the standard. (In practice, every known compiler had always treated it this way.)

                  Comment


                  • Originally posted by DavidBrown View Post
                    Here it is much harder to define. Should the compiler generate operations that use the memory target directly, if the processor has such addressing modes? Should it do a read, then a write, even if could use a single instruction?
                    What is the difficulty? I'd think it could use a single instruction as long as it is an instruction that reads from memory and then writes to memory. That would seem to fulfill the "volatile" meaning above.

                    Originally posted by DavidBrown View Post
                    What about volatile bitfields? Should a write be of the smallest possible size (typically 8 bit), or the size for the type given in the bitfield, or the most efficient access size (which might be 32 bit) ?
                    How is the width of the bitfield access a different question than that for a non-volatile bitfield? (As "volatile" doesn't imply "atomic").

                    Originally posted by DavidBrown View Post
                    Compilers vary hugely in how they treat these, and how consistent they are in such expressions. It is with good reason that C++ now deprecates them, and only recommends the simple accesses.
                    Do you have a reference illustrating this?

                    Originally posted by DavidBrown View Post
                    And what about:
                    Code:
                    *(volatile int*)(&x) = 1;
                    Is that a volatile access, even though "x" was not declared volatile? Until C18, the standard said nothing at all about the meaning of such expressions. With C18, it is now considered a volatile access according to the standard. (In practice, every known compiler had always treated it this way.)
                    AFAIK, the kernel has also used it in this way for a long time. It seems any previous lack of clarity in the language definition is more about the question of whether it is a volatile access or not, rather than a question of what should happen if it is considered a volatile access.
                    Last edited by indepe; 29 June 2022, 08:05 AM.

                    Comment


                    • Originally posted by indepe View Post
                      What is the difficulty? I'd think it could use a single instruction as long as it is an instruction that reads from memory and then writes to memory. That would seem to fulfill the "volatile" meaning above.
                      No, it is not that simple. A single read/modify/write instruction for "v += 1;" is more "atomic" than three separate instructions, but can be very much slower than separate instructions on modern processors. Which is the "correct" choice? There's no right answer. What about "x = v + v;" ? A programmer might think it meant "v" should be read twice - in fact it is undefined behaviour.

                      How is the width of the bitfield access a different question than that for a non-volatile bitfield? (As "volatile" doesn't imply "atomic").
                      For non-volatile bitfields, the compiler is free to write to wider parts of the struct if that gives more efficient code. For volatile bitfields, the C standards give no guidance as to whether or not this is allowed. This is a very important issue for accessing hardware registers on microcontrollers, and let to gcc adding a flag to control behaviour here https://gcc.gnu.org/onlinedocs/gcc/C...tile-bitfields .

                      Do you have a reference illustrating this?
                      You can have a look at this https://www.cs.utah.edu/~regehr/pape...8-preprint.pdf if you like. It's a bit dated, and the author is well-known for having a very critical attitude to compilers (his arguments are often extreme to the point of being clearly wrong), but it is useful to think about these things. You might also like to look at the proposal for deprecating complex volatile expressions in C++: https://www.open-std.org/jtc1/sc22/w...8/p1152r0.html .

                      AFAIK, the kernel has also used it in this way for a long time. It seems any previous lack of clarity in the language definition is more about the question of whether it is a volatile access or not, rather than a question of what should happen if it is considered a volatile access.
                      That is correct - the C standards (until C18) only talked about "accessing a volatile object", while C18 changed it to "accessing an object through an lvalue of volatile type". What the volatile access actually means, and how it is implemented and what it should do, is still entirely up to the implementation. (I know of one compiler that uses specific instructions that bypass the cache on volatile reads and writes.)

                      Comment

                      Working...
                      X