Announcement

Collapse
No announcement yet.

Proposed GCC 12 Security Option Would Auto Initialize Automatic Variables

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by DavidBrown View Post

    Objects of class types are initialised by recursively initialising subobjects, using constructors and initialisers. If you get down to POD data that has no initialisation (such as an "int" field in a class without a constructor), that is left uninitialised just like a plain "int".

    Having programmed a lot of C and C++, I know that having a default initialised value is a truly terrible idea. Programmers who know how these languages work do not expect them to be initialised to zero - and having inexperienced programmers working alone is also a terrible idea. (We all start out as inexperienced - but we need help to learn and people to check our code, not crutches from tools that encourage bad habits.) The problem with pointless initialisation is not one of performance (though some 99.9% of processors shipped are not superscaler). The problem is it stops the compiler telling you when you have made a mistake and read an uninitialised variable. Default initialisation does not make the broken code correct - it merely changes it from undefined behaviour wrong code to defined behaviour but still wrong code that is now harder to spot.

    And if you are structuring your code properly, rather than in pre-C99 style, the whole issue is usually moot - you don't define your local variables until you have a value for them, so default initialisation is of no use.

    Oh, and it does not quite make sense to say an uninitialised object has value "undefined". The value is "indeterminate" (basically meaning it can be any pattern of bits that fits within the storage space of the object). Attempting to use this indeterminate value is undefined behaviour (which is always bad). If the compiler guarantees (such as via an option) that all auto storage variables are initialised to zero, then it is not an indeterminate value, and their use is not undefined behaviour - the behaviour is very clearly defined. (But you are entirely correct that if this new switch "breaks" code, the code was broken from before.)
    Yes, but there are many programmers who _think_ they know the language, and still program the language anyway.

    Also, read this:
    https://accu.org/journals/overload/25/139/brand_2379/
    And tell me you know all these initialization rules by heart. And you think everyone who programs C++ does also.

    Yes, reading an uninitialiased variable it is still wrong. But by default initializing at least the behaviour is _deterministic_. Code crashes and the programmer runs it again. Next time it does not crash, because the variable is initialized to some sane value. Programmer: nah, was just a fluke .. No need to fix... You are a great programmer, for sure, never make a mistake. But I think you overestimate the capabilities of many programmers out there. I have seen programmers do this. Clearly it is why so many scripting languages are populair nowadays.




    Comment


    • #32
      Originally posted by coder View Post
      There are cases where you don't have a meaningful value, at the point of definition. So, you're faced with a dilemma of either zero-initializing it and preventing the compiler from warning you of possible use before it's been properly-assigned, or hoping to benefit from such warnings while incurring the risk that the compiler doesn't catch it and you end up with some unpredictable behavior at runtime. Since I try to write my code semantically, I'm somewhat allergic to using bogus initializers, just for the sake of initializing to something.

      C++11 lambdas enable a style where you can often avoid this situation, but I sometimes wonder if such lambdas could incur runtime overhead, and it can confuse people reading your code who aren't used to that style. It does feel like a slight abuse of the lambda mechanism, but still preferable to bogus initializers.
      Lambdas that have a simple and clear implementation and are called where they are declared, will have no overhead. No, it is not an abuse of lambdas - but it might confuse some people. A lot of programmers seem to consider lambdas to be an "advanced" or complicated feature of the language. (You can certainly use them to write complicated code, and a syntax that allows "[](){}();" as an empty statement could be called unintuitive.) Sometimes bogus initialisers are almost unavoidable, with alternatives being less clear, but I like to try hard to make my local variables "const" where possible.

      Comment


      • #33
        Originally posted by wkleunen View Post

        Yes, but there are many programmers who _think_ they know the language, and still program the language anyway.

        Also, read this:
        Uninitialised variables can cause problems. Simon Brand reminds us how complicated it can get.

        And tell me you know all these initialization rules by heart. And you think everyone who programs C++ does also.

        Yes, reading an uninitialiased variable it is still wrong. But by default initializing at least the behaviour is _deterministic_. Code crashes and the programmer runs it again. Next time it does not crash, because the variable is initialized to some sane value. Programmer: nah, was just a fluke .. No need to fix... You are a great programmer, for sure, never make a mistake. But I think you overestimate the capabilities of many programmers out there. I have seen programmers do this. Clearly it is why so many scripting languages are populair nowadays.
        Yes, there are some hairy complications in the initialisation rules for C++, and the link was thought-provoking. But you can come a very long way by sticking to two basic rules:

        1. If a class is supposed to be usable with a default state, give it a default constructor. And like any constructor, it should always put the object into a consistent and usable state. If the class can't have a sensible default state, don't have a default constructor (using "= delete" as needed).
        2. Define your variables when you have something to put in them.

        (gcc -Wuninitialized catches the errors in the sample code on that webpage.)

        My comments about knowing the language were really about local variables of simple types - "int x;" and the like.

        Comment


        • #34
          Like it ways in the article:

          Fortunately, there’s a simple solution. At the risk of repeating advice which has been given many times before, initialize your variables .

          Seriously.

          Do it.
          Indeed, it is not very difficult. And an experienced C++ developer almost never have this issue. In practice you work with objects many times, not POD. And if you use POD, you initialize it, always... It is not that hard. But sadly many programmers fail to follow even these basic rules.

          You are right a default value from normal working coding does not make says. It should haven been _initialized_. But the default value would be the last line of defense before a security issue arises. Not to help out the programmer.

          Comment


          • #35
            Originally posted by wkleunen View Post

            Indeed, it is not very difficult. And an experienced C++ developer almost never have this issue. In practice you work with objects many times, not POD. And if you use POD, you initialize it, always... It is not that hard. But sadly many programmers fail to follow even these basic rules.
            If your POD is very large, say an array of double, or even some more complicated data structure made of PODs, you might still prefer not to initialize it at the point of declaration. Of course in those cases you're usually going to fill it up after some calculation. Although then it would really be better to return the POD from some function that produces it.
            Last edited by klapaucius; 25 February 2021, 01:31 PM.

            Comment


            • #36
              Originally posted by klapaucius View Post
              However, this patch only applies to local variables which are usually declared immediately before their use, where -Wuninitialized shouldn't really fail.
              Of course it can fail!

              There are many scenarios, but here's a common one:

              Code:
              int i;
              int status = ReadValue(file, &i);
              if (status != SUCCESS) return status;
              
              printf("Value is %d.\n", i);
              Now, let's say the compiler lacks visibility into ReadValue(), which could be in another shared library (in which case even LTO couldn't be of any help). Let's consider the possibilities:
              1. Does the compiler make a pessimistic assumption about how ReadValue() uses i, and complain about an uninitialized i being passed into ReadValue(), even though it's clear to us that i is used as a destination? That would be a false-positive, if we know ReadValue() only writes i.
              2. Does the compiler make a pessimistic or optimistic assumption about whether ReadValue() writes to i? If it makes an optimistic assumption and there's some bug in ReadValue(), then you just got a false-negative and i will be used uninitialized.
              3. If it makes a pessimistic assumption and ReadValue() behaves correctly, then the compiler will issue a false-positive warning.
              If the compiler makes every pessimistic assumption, then such warnings would likely be so numerous that people would simply disable them. So, it can't realistically err on the side of caution and has to be more permissive than we'd like. As a half-way compromise, they added -Wmaybe-uninitialized, which still doesn't make every pessimistic assumption and yet manages to generate too many false-positives for my taste.

              Now, that simply covers the visibility issue. I don't have a ready example for computational complexity, but I'm sure many cases involving recursion can rapidly spiral beyond the practical limits of compilers' static analysis capabilities and the patience of their human users (not to mention RAM).

              Here's a far more basic example involving the dismal realities of static analysis. I'm still experiencing numerous issues with this one, even on GCC v9.

              I have code similar to the following: #include <boost/optional.hpp> ::boost::optional<int> getitem(); int go(int nr) { boost::optional<int> a = getitem(); boost::optional&l...

              Comment


              • #37
                Originally posted by DavidBrown View Post
                The problem with pointless initialisation is not one of performance (though some 99.9% of processors shipped are not superscaler). The problem is it stops the compiler telling you when you have made a mistake and read an uninitialised variable. Default initialisation does not make the broken code correct - it merely changes it from undefined behaviour wrong code to defined behaviour but still wrong code that is now harder to spot.
                Well said.

                And that's why I'm in favor of the proposed change, along with some hint that valgrind can use to find reads of these formally-uninitialized variables. Please do use it in combination with -Wuninitialized, but because that's not 100%, we still need to do runtime analysis with tools like Valgrind. It's just that Valgrind has very weak support for stack-related bugs, making it dependent on the compiler for any help it can offer, here.

                Originally posted by DavidBrown View Post
                you don't define your local variables until you have a value for them, so default initialisation is of no use.
                Nonsense. Aside from cases like the one I just posted above, there's always:

                Code:
                int limit;
                switch (some_enum)
                {
                case EnumVal1:
                    limit = Limit1;
                    break;
                
                case EnumVal2:
                    limit = Limit2;
                    break;
                
                case EnumVal3;
                    limit = Limit3;
                    break;
                }
                As you've eloquently stated, I don't want a bogus default initializer, because that prevents the compiler telling me whether it's meaningfully initialized. Furthermore, I want the compiler to warn me if there's some case I've omitted, which is why I added no default.

                However, what if the case statements are doing more than just initializing this one variable? And what if the values to which it's being initialized are computed on-the-fly, as a function of other local state? I potentially lose the opportunity to factor this out into a standalone function or use something like a std::map<EnumType, int>.

                You just can't practically avoid this sort of thing, 100% of the time.

                Comment


                • #38
                  Originally posted by DavidBrown View Post
                  Lambdas that have a simple and clear implementation and are called where they are declared, will have no overhead.
                  A lambda is just an anonymous function. As such, it still has a prologue, epilogue, call-, and return- overhead. Now, it could get inlined, in this case. No guarantees, but, on a good day, when the weather is nice, and the compiler is in a good mood, and you haven't done anything to make it cranky -- like maybe passing in an object by-value, that has a non-inline copy-constructor (or something like that) -- then, I'm willing to accept that it could get inlined with no overhead.

                  Originally posted by DavidBrown View Post
                  it might confuse some people. A lot of programmers seem to consider lambdas to be an "advanced" or complicated feature of the language.
                  I have witnessed, first-hand a case where I observed 2-3 experienced C++11 programmers puzzling over just such a use of lambda, trying to figure out why lambda was being used and if there was something going on necessitating the use of lambda that they missed. It's definitely a style, and one with certain potential to unsettle people not used to it.

                  Originally posted by DavidBrown View Post
                  Sometimes bogus initialisers are almost unavoidable, with alternatives being less clear, but I like to try hard to make my local variables "const" where possible.
                  As I said, I'm allergic to bogus initializers. If I can't restructure my code to avoid uninitialized variables, I just leave them uninitialized so that at least the compiler has a chance of warning me if I fail to initialize them, meaningfully.

                  Oh, and here's another example of where there's no valid default initializer:

                  Code:
                  double best_val;
                  bool best_assigned = false;
                  for (const double &val: ReadValues())
                  {
                      if (!IsValid(val)) continue;
                  
                      if (!best_assigned)
                      {
                          best_val = val;
                          best_assigned = true;
                      }
                      else if (IsBetterThan(val, best_val))  best_val = val;
                  }
                  
                  printf("Best value: %f\n", best_val);   // intentional bug - should be conditioned on best_assigned
                  If you default-initialized best_val, then you'd break -Wmaybe-uninitialized. However, even if I had correctly checked best_assigned, before printing best_val, I still might get a false-positive from -Wmaybe-uninitialized -- at least, that's exactly what often happens when boost:: optional is used to do the exact same thing.

                  Comment


                  • #39
                    Originally posted by coder View Post
                    Of course it can fail!

                    There are many scenarios, but here's a common one:

                    Code:
                    int i;
                    int status = ReadValue(file, &i);
                    if (status != SUCCESS) return status;
                    
                    printf("Value is %d.\n", i);
                    Now, let's say the compiler lacks visibility into ReadValue(), which could be in another shared library (in which case even LTO couldn't be of any help). Let's consider the possibilities:
                    1. Does the compiler make a pessimistic assumption about how ReadValue() uses i, and complain about an uninitialized i being passed into ReadValue(), even though it's clear to us that i is used as a destination? That would be a false-positive, if we know ReadValue() only writes i.
                    2. Does the compiler make a pessimistic or optimistic assumption about whether ReadValue() writes to i? If it makes an optimistic assumption and there's some bug in ReadValue(), then you just got a false-negative and i will be used uninitialized.
                    3. If it makes a pessimistic assumption and ReadValue() behaves correctly, then the compiler will issue a false-positive warning.
                    If the compiler makes every pessimistic assumption, then such warnings would likely be so numerous that people would simply disable them. So, it can't realistically err on the side of caution and has to be more permissive than we'd like. As a half-way compromise, they added -Wmaybe-uninitialized, which still doesn't make every pessimistic assumption and yet manages to generate too many false-positives for my taste.
                    Wow, I've just realised it's even worse than you say, here's a very simple case in C++ where it doesn't raise any warning.

                    Code:
                    #include<iostream>
                    #include<cmath>
                    
                    int ReadValue(double* D)
                    {
                        return static_cast<int>(std::floor(*D));
                    }
                    
                    int main()
                    {
                        double d;
                        int status = ReadValue(&d);
                        std::cout << d <<'\n';
                        if (status !=0) return status;
                        printf("Value is %g.\n", d);
                    }

                    Comment


                    • #40
                      Originally posted by coder View Post
                      Nonsense. Aside from cases like the one I just posted above, there's always:

                      Code:
                      int limit;
                      switch (some_enum)
                      {
                      case EnumVal1:
                      limit = Limit1;
                      break;
                      
                      case EnumVal2:
                      limit = Limit2;
                      break;
                      
                      case EnumVal3;
                      limit = Limit3;
                      break;
                      }
                      You just can't practically avoid this sort of thing, 100% of the time.
                      Nice example to show different appraches when writing code. I would write this snippet as following:


                      Code:
                      int limit = ERROR_CODE;
                      switch (some_enum)
                      {
                      case EnumVal1:
                      limit = Limit1;
                      break;
                      
                      case EnumVal2:
                      limit = Limit2;
                      break;
                      
                      case EnumVal3;
                      limit = Limit3;
                      break;
                      }
                      
                      if (limit == ERROR_CODE) {
                      log("ERROR: class.function: internal error: unexpectedy found no match when doing something using " + some_enum + ", aborting / returning errorcode X / etc.");
                      abort / throw / return;
                      }
                      Not initializing variables after declaration is an error in my book.

                      Comment

                      Working...
                      X