Announcement

Collapse
No announcement yet.

Linux Kernel Moving Ahead With Going From C89 To C11 Code

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by DavidBrown View Post
    You avoid stack overflow in your VLA's by making sure you don't make them bigger than appropriate - it's no different from what you should be doing before allocating dynamic arrays on the heap. Basically, if you think VLA's are a risk then you should be thinking long and hard about whether C is the right language for you, or if you really have understood the situation.
    Not that I totally disagree, but I can see the argument against them. Basically, the issue is that you have to be more diligent about checking their size than you would with heap-allocated arrays. It's common not to even think about stack overflows, other than in cases where you're using recursion.

    The benefit of arrays on stack is that not only can you avoid the overhead of dynamic allocation, but the address range is also likely to be very warm in the cache hierarchy. I've taken advantage of this, to very good effect. The best example was to dynamically build a small image processing pipeline on the stack, have the leaf-node function pull data through it, and then teardown the whole thing as the setup functions for each stage returned. The stack-allocated data structures held at most a couple scanlines of image data, each.

    Obviously, another argument against them would be the risk of security exploits via payloads that can trigger out-of-bounds accesses. Shadow stacks should do a lot to mitigate that scenario, though I'm generally much more reluctant to use stack for things like string-processing.

    Comment


    • #42
      Originally posted by coder View Post
      They keep introducing new language features that seem to chip away at the use cases for templates. For instance, C++17 now has constexpr if statements. A co-worker once complained to me that he wanted to use one, but couldn't due to our build still using C++14. I wrote him a good 'ol C++03-compatible template solution that accomplished the same goal.

      Functions with auto parameters are another example that comes to mind, at the moment. Technically, the are templates, but it's a way to write a function template without using traditional template syntax. I'm sure I've run across others.
      The C++ core language often progresses by looking at the code people write, and trying to find ways to write it in a simpler and clearer fashion. A better example than "constexpr if" would be "constexpr" functions in C++11 - they were introduced to allow compile-time calculations. Before their introduction, people were using templates for compile-time calculations, which was a lot harder and less flexible and efficient. But that does not mean template programming became rarer just because there was now a better alternative for that use-case!

      And functions with "auto" parameters (or concept parameters) are, as you say, templates. The syntax is a little neater, but they are still templates. The same applies to templated lambdas.

      Comment


      • #43
        Originally posted by DavidBrown View Post
        The C++ core language often progresses by looking at the code people write, and trying to find ways to write it in a simpler and clearer fashion.
        Then we should have standard algorithms you can directly pass a container into. I recently optimized some code where an idiot was using one to iterate over a std::set, instead of using std::set::find(), and he was proud of himself for using a standard algorithm. If std::find() merely accepted the entire container, it could be specialized to use a container's find() method, for those that have one.

        Not only that, but standard algorithms' iterator-based interface is clumsy, verbose, and has multiple unnecessary hazards for cases where you're just using a container's begin() and end().
        Last edited by coder; 01 March 2022, 01:00 PM.

        Comment


        • #44
          Originally posted by coder View Post
          Then we should have standard algorithms you can directly pass a container into. I recently optimized some code where an idiot was using one to iterate over a std::set, instead of using std::set::find(), and he was proud of himself for using a standard algorithm. If std::find() merely accepted the entire container, it could be specialized to use a container's find() method, for those that have one.

          Not only that, but standard algorithms' iterator-based interface is clumsy, verbose, and has multiple unnecessary hazards for cases where you're just using a container's begin() and end().
          Code:
          #include <iostream>
          #include <set>
          #include <algorithm>
          
          int main() {
              std::set<int> xs = { 10, 20, 30 };
          
              std::cout << "xs.find " <<
                      (xs.find(20) != xs.end())
                      << "\n";
              std::cout << "std::find " <<
                      (std::find(xs.begin(), xs.end(), 20) != xs.end())
                      << "\n";
              std::cout << "std::ranges::find " <<
                      (std::ranges::find(xs, 20) != xs.end())
                      << "\n";
          }
          std::ranges is from C++20. You are not the first person to think this! (std::ranges has many other features as well.)

          Comment


          • #45
            Originally posted by coder View Post
            Not that I totally disagree, but I can see the argument against them. Basically, the issue is that you have to be more diligent about checking their size than you would with heap-allocated arrays. It's common not to even think about stack overflows, other than in cases where you're using recursion.

            The benefit of arrays on stack is that not only can you avoid the overhead of dynamic allocation, but the address range is also likely to be very warm in the cache hierarchy. I've taken advantage of this, to very good effect. The best example was to dynamically build a small image processing pipeline on the stack, have the leaf-node function pull data through it, and then teardown the whole thing as the setup functions for each stage returned. The stack-allocated data structures held at most a couple scanlines of image data, each.

            Obviously, another argument against them would be the risk of security exploits via payloads that can trigger out-of-bounds accesses. Shadow stacks should do a lot to mitigate that scenario, though I'm generally much more reluctant to use stack for things like string-processing.
            I appreciate all your arguments here, and you are far from the only one who thinks as you do here. My attitude is that if any of these is a relevant issue, then you need to deal with those issues in the right place - and then there are no concerns left for VLAs. If you've got data coming in from outside, then it is clearly unsafe to use it for the size of a VLA without checking. But it is also unsafe to use it for the size of a heap allocation or any other purpose without checking it.

            If you are not thinking about the validity of your sizes before allocating space on the heap, then you have an attitude problem in your programming. If you train yourself to check your data before using it (or otherwise ensure that it is safe), then you get it right for heaps and VLAs. If you are lazy about this for VLAs, you'll be lazy about it elsewhere.

            My viewpoint here is that a bug is a bug. An unwarranted assumption is a bug. Avoid bugs as best you can, and use the best tools for the job. Changing a badly written VLA into a heap allocation will not fix the problem - at best, it will hide it so you don't see the problem until after the code has left the developer's lab.

            Comment


            • #46
              Some people prefer to separate variable declaration and use, and think non-trivial initialisations and mixing code and data declarations jumbles code and encourages larger functions.
              I still prefer that method. Variables are declared at each start of function. I feel it makes for clearer and cleaner code. Each to their own though. I've been using C since '85 or so.

              Comment


              • #47
                Originally posted by DavidBrown View Post
                If you are not thinking about the validity of your sizes before allocating space on the heap, then you have an attitude problem in your programming. If you train yourself to check your data before using it (or otherwise ensure that it is safe), then you get it right for heaps and VLAs. If you are lazy about this for VLAs, you'll be lazy about it elsewhere.
                You're not wrong, but neither is Steffo , about the behavior of malloc() returning NULL being a more obvious problem to check for and friendlier to debug. When most programmers use arrays, they're probably accustomed to the size being compile-time static and therefore not needing to always check it. So, there's a realistic danger that habit won't be ingrained in them like checking for malloc() failing.

                And, by the way, when malloc() fails, it's most often with an absurdly large value, in my own limited experience. So much so that the system isn't even going to try allocating that much memory. If I had any expectation that something might be too big, then you're right that I would likely range-check the size, but the typical scenario where malloc() or new fails is one where the size is computed and there's some arithmetic or other bug yielding a nonsensical size. This is something that's tricky to check for, because it puts the programmer in the position of trying to set some arbitrary limit for the code. Those kinds of policy decisions (i.e. how much memory it's allowed to use) are much better made at the user level, because the code actually doing the allocation often lacks the necessary context.

                Originally posted by DavidBrown View Post
                My viewpoint here is that a bug is a bug. An unwarranted assumption is a bug. Avoid bugs as best you can, and use the best tools for the job.
                100% agree.

                Originally posted by DavidBrown View Post
                Changing a badly written VLA into a heap allocation will not fix the problem - at best, it will hide it so you don't see the problem until after the code has left the developer's lab.
                Fairly recent facilities like GCC's -fstack-protector and sanitizers, should certainly help in catching stack bugs. However, I think there's some conventional wisdom that heap bugs are easier to solve (e.g. using tools like valgrind) than stack. And it's not entirely wrong.
                Last edited by coder; 01 March 2022, 05:09 PM.

                Comment


                • #48
                  Originally posted by rclark View Post
                  I still prefer that method. Variables are declared at each start of function. I feel it makes for clearer and cleaner code. Each to their own though. I've been using C since '85 or so.
                  I remember when I switched from C to C++. It takes a little getting used to, but I quickly grew to prefer the intermingled style.

                  I have a hilarious story about this (at least, we got a good laugh out of it). A fairly senior developer decided he'd adopt the intermingled style, but was concerned he would overlook a variable declaration in the body of some code. He'd also noticed the heretofore unused auto keyword (as this was pre-2011, it still had the legacy behavior inherited from C) and liked how his editor highlighted it. So, he adopted it to use in his intermingled variable declarations, just so they would stand out in his editor! Needless to say, before we could start building the codebase in C++11 mode, we had to purge it of all these extraneous auto's ...and this guy was one of the more prolific contributors! Pretty simple search-and-replace/remove, but still funny.

                  Getting back to my own journey...
                  a fairly big turning point was when I learned about static single assignment programming style that seems to be gaining favor. I briefly used another language which only supported single-assignment, and after finding it wasn't nearly as limiting as I'd expected, I started writing my C++ in that way (within reason -- I don't forego loops in favor of tail-recursion, or even go out of my way to avoid updating variables in loops).

                  I'm a big believer in having well-defined semantics that are clearly-reflected in variable and function names. I think single-assignment style supports that objective, and ultimately makes the code much more maintainable.
                  Last edited by coder; 01 March 2022, 05:55 PM.

                  Comment


                  • #49
                    Originally posted by coder View Post
                    I remember when I switched from C to C++. It takes a little getting used to, but I quickly grew to prefer the intermingled style.

                    Getting back to my own journey...
                    a fairly big turning point was when I learned about static single assignment programming style that seems to be gaining favor. I briefly used another language which only supported single-assignment, and after finding it wasn't nearly as limiting as I'd expected, I started writing my C++ in that way (within reason -- I don't forego loops in favor of tail-recursion, or even go out of my way to avoid updating variables in loops).

                    I'm a big believer in having well-defined semantics that are clearly-reflected in variable and function names. I think single-assignment style supports that objective, and ultimately makes the code much more maintainable.
                    I fully agree. Single-assignment form (except for things like loop variables) makes the code a lot clearer and simpler to follow. I don't like to go overboard with it - I've seen some code that makes heavy use of the comma operator and nested conditional operators in order to force things into an expression form for SSA. But where practical, SSA means that you only ever need to look in one place to see what a variable is. I'm a fan of making variables "const" - then you know the variable won't change, and the compiler helps spot your mistakes. Many modern languages make "const" variables the default, and require an extra keyword for mutable variables - I think that's the right idea.

                    What was the SSA language you used? At university I learned functional programming (with a language much like Haskell), and while I don't do much functional programming now, it had a significant influence on my style.

                    Comment


                    • #50
                      Originally posted by coder View Post
                      You're not wrong, but neither is Steffo , about the behavior of malloc() returning NULL being a more obvious problem to check for and friendlier to debug. When most programmers use arrays, they're probably accustomed to the size being compile-time static and therefore not needing to always check it. So, there's a realistic danger that habit won't be ingrained in them like checking for malloc() failing.

                      Fairly recent facilities like GCC's -fstack-protector and sanitizers, should certainly help in catching stack bugs. However, I think there's some conventional wisdom that heap bugs are easier to solve (e.g. using tools like valgrind) than stack. And it's not entirely wrong.
                      The big problem I see with the theory that "malloc returns 0 on failure" or "new throws on failure" is that programmers handle it badly. It can often look good enough locally, but it rarely considers the big picture and any potential knock-on effects. And it is never tested. (The kind of programming tasks that justify full testing of all possible paths will not allow any use of dynamic memory.)

                      On typical PC programming, there are perhaps four possibilities for the result of a heap allocation. Most of the time, you get the memory and everyone is happy. Occasionally (and I'll agree it's rare in practice) you'll bring the system to its knees trying to give you what you asked. Or you have a bug and have asked for too much and get a null pointer back. This has two sub-cases - one is that your code handles it well - perhaps giving a message to aid debugging - and everything else works as normal. The other, more realistic IME, is that it gets handled badly leading to unexpected strange effects and errors later, with no one understanding why.

                      If you make a stack allocation, there are three possibilities. One is that it works and everything is fine. Or your allocation is too big, and the program crashes immediately with a stack overflow error (it's broken, but the developers can see where and why). Or you are unlucky enough to get the memory but take so much of the stack that the program crashes later in the call tree, which makes it harder to identify the problem.

                      I am not at all convinced by the argument that heap allocation errors are easier to debug, and while in theory they are easier to identify, in practice this is rarely done well. I fully support the use of sanitizers and similar tools for debugging and testing (except that if you take an existing "working" program and compile with "-fsanitize=undefined", you'll probably get very depressed at the results!). But for heap debugging, you need to go out of your way - you need to make use of sanitizers, valgrind, or similar. For stack overflow debugging, you get the crash for free :-)

                      Comment

                      Working...
                      X