Announcement

Collapse
No announcement yet.

Proposed GCC 12 Security Option Would Auto Initialize Automatic Variables

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by coder View Post
    Of course it can fail!

    There are many scenarios, but here's a common one:

    Code:
    int i;
    int status = ReadValue(file, &i);
    if (status != SUCCESS) return status;
    
    printf("Value is %d.\n", i);
    Yes, that is an example of where the compiler can't know everything, and the programmer has to help. If you are going to get the best from the compiler, either from its optimisations or from its static error checking, you sometimes have to help it. And you have to take responsibility for the code you write - C is not a checked language. So you have to know that "ReadValue" puts a value into "i" here. (The C standard has something interesting to say in cases like this. Whereas normally the use of an uninitialised local variable is undefined behaviour, if you have taken its address like this it becomes defined behaviour with "i" having an indeterminate value if ReadValue doesn't set it.)

    Warnings are often an indication that something might be wrong, not that something is wrong. For common cases, there are therefore idioms that can be used to get what you want from the tools. "int i = i;" , or self-initialisation, tells the compiler that the variable should be treated as initialised for static checking, but does not generate any code. At other times, you might use "__builtin_unreachable()" to tell the compiler a code path cannot be reached, improving optimisation and error checking.


    Here's a far more basic example involving the dismal realities of static analysis. I'm still experiencing numerous issues with this one, even on GCC v9.

    https://stackoverflow.com/questions/...-in-this-funct
    In my testing, I had to go back to gcc 5.5 (from 2016) to get a false-positive warning with this code. But I didn't have such an old version of boost to test conveniently (I tested using https://godbolt.org/). I do appreciate that there are times when old compiler versions and old library versions are appropriate - for some project types, you don't change these things. But this particular issue was fixed long ago, in the compiler and/or the boost libraries.

    Comment


    • #42
      Originally posted by klapaucius View Post
      Wow, I've just realised it's even worse than you say, here's a very simple case in C++ where it doesn't raise any warning.

      Code:
      #include<iostream>
      #include<cmath>
      
      int ReadValue(double* D)
      {
      return static_cast<int>(std::floor(*D));
      }
      
      int main()
      {
      double d;
      int status = ReadValue(&d);
      std::cout << d <<'\n';
      if (status !=0) return status;
      printf("Value is %g.\n", d);
      }
      Try specifying -O2, as well. Historically, GCC's warnings are somewhat tied to the optimizer.

      Interestingly, based on which -O setting I use, gcc 9.3.1 produces one of three different behaviors:
      • no -O option or -O0 produces no warnings.
      • -O1, -Os, and -Og all report a warning on line 13.
      • -O2 and -O3 report a warning on line 6.
      Last edited by coder; 26 February 2021, 08:25 AM.

      Comment


      • #43
        Originally posted by reba View Post
        Nice example to show different appraches when writing code. I would write this snippet as following:

        Code:
        int limit = ERROR_CODE;
        switch (some_enum)
        {
        case EnumVal1:
        limit = Limit1;
        break;
        
        case EnumVal2:
        limit = Limit2;
        break;
        
        case EnumVal3;
        limit = Limit3;
        break;
        }
        
        if (limit == ERROR_CODE) {
        log("ERROR: class.function: internal error: unexpectedy found no match when doing something using " + some_enum + ", aborting / returning errorcode X / etc.");
        abort / throw / return;
        }
        Not initializing variables after declaration is an error in my book.
        I have two things to say about that:
        1. It was a toy example, but you could easily imagine a case where the value of limit is computed at runtime, and maybe no safe value for an ERROR_CODE constant is known. However, you can still insert a runtime check by adding a parallel variable to track whether limit was assigned. In C++, this becomes much easier & less error-prone, with containers like boost:: optional (now std:: optional, in C++17).
        2. By assigning a bogus initializer, you've now converted a type of error potentially detectable at compile-time into a runtime error that one can only find through rigorous testing.
        I prefer to exploit my compiler to the greatest degree possible, so I would sooner use a parallel flag variable (or optional<> container), than do something which defeats the potential for compile-time error detection via static analysis.

        Comment


        • #44
          Originally posted by coder View Post
          Well said.

          And that's why I'm in favor of the proposed change, along with some hint that valgrind can use to find reads of these formally-uninitialized variables. Please do use it in combination with -Wuninitialized, but because that's not 100%, we still need to do runtime analysis with tools like Valgrind. It's just that Valgrind has very weak support for stack-related bugs, making it dependent on the compiler for any help it can offer, here.
          Run-time analysers are useful tools, certainly, and can catch things that static analysers cannot. It is usually more efficient to use "sanitizers" than external tools like Valgrind, when there is support for them. From a quick check on the web, it looks like clang should be able to spot uninitialised memory with the MemorySanitizer, but gcc doesn't (yet) support that one. But note that for the 99.9% of processors I mentioned, you don't have Valgrind or such tools, and have only some sanitizer capabilities - you don't have files, or terminals, or screens, or a host OS. So spotting potential errors during compilation is a big benefit.

          Nonsense.
          Let me rather say that as far as you reasonably can, you don't define your variables until you have a value to put in them. Yes, there are cases where that is not going to be possible, or at least not convenient. But modern styles of C and C++ programming with most local variables initialised explicitly is a world apart from ancient pre-C99 styles of C code with huge functions where all local variables are defined at the head of the function.

          Aside from cases like the one I just posted above, there's always:

          Code:
          int limit;
          switch (some_enum)
          {
          case EnumVal1:
          limit = Limit1;
          break;
          
          case EnumVal2:
          limit = Limit2;
          break;
          
          case EnumVal3;
          limit = Limit3;
          break;
          }
          As you've eloquently stated, I don't want a bogus default initializer, because that prevents the compiler telling me whether it's meaningfully initialized. Furthermore, I want the compiler to warn me if there's some case I've omitted, which is why I added no default.
          There are better ways to handle this. But as I said in another post (written after you wrote your one), you sometimes need to help the compiler if you want the best from it. C and C++ let you give the compiler certain kinds of information, but sometimes you can benefit from using extensions or idioms which give the compiler more information. Examples include using the "-Wno-self-init" flag and then "int limit = limit;", or adding:

          Code:
            default : __builtin_unreachable();
          to your switch. Use that with the "-Wswitch-enum" flag so that you still get a warning on missing cases. This solution has three benefits - it documents the situation to the reader, it lets you use extra warning flags without getting a false positive here, and it results in more optimised code. The first point here is perhaps the most important - if I had read your switch code here for a code review, I would have marked it up as an error because it is not clear that it sets "limit" correctly in all cases.

          (In practice, I recommend using a macro rather than gcc's "__builtin_unreachable()", since it is easier to adapt it to different compilers, or add a run-time error function during debugging.)

          However, what if the case statements are doing more than just initializing this one variable? And what if the values to which it's being initialized are computed on-the-fly, as a function of other local state? I potentially lose the opportunity to factor this out into a standalone function or use something like a std::map<EnumType, int>.

          You just can't practically avoid this sort of thing, 100% of the time.
          Agreed - you can't avoid it 100% of the time. But you can certainly avoid it most of the time - more often than many people might think.

          Comment


          • #45
            Originally posted by DavidBrown View Post
            Warnings are often an indication that something might be wrong, not that something is wrong.
            A warning that generates too many false-positives is not practically usable. A warning that generates too many false-negatives provides a false sense of security.

            Originally posted by DavidBrown View Post
            For common cases, there are therefore idioms that can be used to get what you want from the tools. "int i = i;" , or self-initialisation, tells the compiler that the variable should be treated as initialised for static checking, but does not generate any code.
            Heh, there are other warnings which that will trigger! See -Winit-self:


            IMO, whenever you start writing such gibberish, I think you've fallen off a semantic cliff. It's too idiomatic and even fragile (see above).

            Not only that, but when I don't have a meaningful initial value, I want to give the compiler a chance to figure out if I'm using it uninitialized.

            Originally posted by DavidBrown View Post
            At other times, you might use "__builtin_unreachable()" to tell the compiler a code path cannot be reached, improving optimisation and error checking.
            That should be rare, but I agree that it's sometimes useful. For the sake of portability and readability, I wrap such constructs in macros.

            Originally posted by DavidBrown View Post
            In my testing, I had to go back to gcc 5.5 (from 2016) to get a false-positive warning with this code.
            ...
            But this particular issue was fixed long ago, in the compiler and/or the boost libraries.
            Perhaps in that specific case, but I still get some of those warnings in our codebase, using gcc 9.3.1. Believe me, I wish it weren't so!

            Comment


            • #46
              Originally posted by DavidBrown View Post
              you sometimes need to help the compiler if you want the best from it. C and C++ let you give the compiler certain kinds of information, but sometimes you can benefit from using extensions or idioms which give the compiler more information. Examples include ... adding:

              Code:
              default : __builtin_unreachable();
              to your switch. Use that with the "-Wswitch-enum" flag so that you still get a warning on missing cases. This solution has three benefits - it documents the situation to the reader, it lets you use extra warning flags without getting a false positive here, and it results in more optimised code. The first point here is perhaps the most important - if I had read your switch code here for a code review, I would have marked it up as an error because it is not clear that it sets "limit" correctly in all cases.
              Please explain this in detail. I want to be sure I fully understand what you're proposing.

              Originally posted by DavidBrown View Post
              Agreed - you can't avoid it 100% of the time. But you can certainly avoid it most of the time - more often than many people might think.
              Yes, I think we're basically in agreement. Your earlier post seemed to take a more absolutist stance, which I clearly reject.

              Comment


              • #47
                Originally posted by coder View Post
                A lambda is just an anonymous function. As such, it still has a prologue, epilogue, call-, and return- overhead. Now, it could get inlined, in this case. No guarantees, but, on a good day, when the weather is nice, and the compiler is in a good mood, and you haven't done anything to make it cranky -- like maybe passing in an object by-value, that has a non-inline copy-constructor (or something like that) -- then, I'm willing to accept that it could get inlined with no overhead.
                My experience is that simple lambdas that map clearly to equivalent local code invariably get inlined. The same applies to an internal linkage function (static function, anonymous namespace, inline function, etc.) that is only used once - the compiler knows there is no need for or benefit to making it a separate function. The exception is if your optimisation flags ask for something else - "-Og" will be much more conservative about inlining, as will "-O0". But if you are concerned about overhead and efficiency, you'll be using "-O2" (or better, or a custom combination of options) anyway. Very big lambdas might not be inlined, but the function call overhead is not a problem in such cases, and moving it out of line may improve code flow, cache friendliness, etc.

                In short, I consider lambdas to be "free" - just like separate functions, and extra local variables. If they are not actually needed as separate entities, the compiler will merge them or remove them.

                I have witnessed, first-hand a case where I observed 2-3 experienced C++11 programmers puzzling over just such a use of lambda, trying to figure out why lambda was being used and if there was something going on necessitating the use of lambda that they missed. It's definitely a style, and one with certain potential to unsettle people not used to it.
                Yes, indeed. People sometimes think a person can say they "know C++" and then they can easily understand any C++ code. That is, of course, not true.


                Oh, and here's another example of where there's no valid default initializer:

                Code:
                double best_val;
                bool best_assigned = false;
                for (const double &val: ReadValues())
                {
                if (!IsValid(val)) continue;
                
                if (!best_assigned)
                {
                best_val = val;
                best_assigned = true;
                }
                else if (IsBetterThan(val, best_val)) best_val = val;
                }
                
                printf("Best value: %f\n", best_val); // intentional bug - should be conditioned on best_assigned
                If you default-initialized best_val, then you'd break -Wmaybe-uninitialized. However, even if I had correctly checked best_assigned, before printing best_val, I still might get a false-positive from -Wmaybe-uninitialized -- at least, that's exactly what often happens when boost:: optional is used to do the exact same thing.
                All I can tell you is that this is not likely to be an issue with newer versions of gcc and newer "optional" (like the real C++17 libraries, rather than boost, or at least newer boost versions). Boost libraries are "cutting edge" - they are often experimental, and often rely on very hairy implementations. A good deal of development of recent C++ standards has been done by looking at popular boost libraries and seeing how they could be put in the standard library, then looking at what language features need to be added to get good and efficient implementations of them. And newer versions of gcc go hand in hand with this process. So if you use a compiler and boost with a library that is relatively new to that version of boost, you are going to see sub-optimal code generation and sub-optimal static checking. That is the cost of the cutting edge. As time passes, the compiler and the libraries improve - if you can, you update (again - I fully appreciate this is not always possible).

                Comment


                • #48
                  Originally posted by coder View Post
                  A warning that generates too many false-positives is not practically usable. A warning that generates too many false-negatives provides a false sense of security.
                  Agreed. Warning flag choices need to match the kind of code you are writing, and the style you use. One size does not fit all.

                  IMO, whenever you start writing such gibberish, I think you've fallen off a semantic cliff. It's too idiomatic and even fragile (see above).

                  Not only that, but when I don't have a meaningful initial value, I want to give the compiler a chance to figure out if I'm using it uninitialized.
                  I can't say I have ever had use of self-initialisation myself - I merely present it as an option. For my own code, I am happy with "-Wuninitialized" and "-Wmaybe-uninitialized", and very rarely need to have bogus initialisers or other code to avoid such warnings.

                  That should be rare, but I agree that it's sometimes useful. For the sake of portability and readability, I wrap such constructs in macros.
                  Agreed. (I mentioned that in another post.)

                  Perhaps in that specific case, but I still get some of those warnings in our codebase, using gcc 9.3.1. Believe me, I wish it weren't so!
                  If programming were easy, and compilers did all the work, where would the fun (and pay) be for us? :-)

                  Comment


                  • #49
                    Originally posted by DavidBrown View Post
                    If programming were easy, and compilers did all the work, where would the fun (and pay) be for us? :-)
                    Unfortunately, I have co-workers who insist on turning up the warning level too high, deriving a false sense of security and forcing us to "fix" a whole bunch of non-errors. I've seen a vanishingly small number of real liabilities detected by such noisy warnings, but I think that says more about gcc (and my co-workers) than anything else. Clang's static analysis is supposed to be better.

                    Comment


                    • #50
                      Originally posted by DavidBrown View Post
                      if you use a compiler and boost with a library that is relatively new to that version of boost, you are going to see sub-optimal code generation and sub-optimal static checking. That is the cost of the cutting edge. As time passes, the compiler and the libraries improve - if you can, you update
                      Point taken, but Boost's optional<> was first released in v1.30, 18 years ago!

                      Comment

                      Working...
                      X