Announcement

Collapse
No announcement yet.

Richard Stallman Announces GNU C Language Reference Manual

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DavidBrown
    replied
    Originally posted by coder View Post
    If you think it's better to have dangling locks and leak file descriptors (which also means likely unflushed data, as you mentioned) than to safeguard their release via garbage collection, then you're operating under a very different world view than I am. I didn't say there's no benefit to having explicit unlocks or file closures -- just that I'd want garbage collection as a backstop.
    Perhaps I am viewing things differently. But, no, I do not want a "backstop" or "safeguard" in my code for this kind of thing. I want safeguards at the OS level - an OS (here I mean general purpose OS's running unknown programs, rather than RTOS'es or other specialised systems) should assume that programs are full of bugs and leak. It must clear up all resources a program might have taken, once the program ends, dies, or is forcibly killed.

    I want the programmer to use the correct language constructs for the task in hand. Then you do not need backstops - they add nothing but confusion and totally untestable code. You cannot rely on garbage collection handling the release of critical resources at an appropriate time. Therefore, you must use something else - something that you can rely on. And once you have that, garbage collection is no longer relevant.

    You're failing to distinguish between what the language actually implements vs. the prescribed best practices. If we take the example of a Python with statement, that simply builds atop the object's existing semantics. It closes the file or releases the lock because that's what the object's destructor does. So, you're wrong to say that Python doesn't use garbage collection for those things, but what it does is provide a better alternative mechanism.
    Locks (and file handles, and other non-memory resources) are not handled by destructors in correctly written Python. They are handled by "with" statements - the release is deterministic (to the extent that Python is deterministic) and synchronous, as the "__exit__" method is called at the end of the "with" statement. Before "with" statements were introduced to Python, try/finally blocks were used.

    It is possible that you are mixing up deterministic destructors and the term "garbage collection". That would explain a lot of our disagreement here.

    Garbage collection is asynchronous and non-deterministic, and used to clear up memory from old objects once there are no longer any live references to them. There are various garbage collection algorithms (with different advantages and disadvantages), but they typically involve running through the working memory looking for object references. By working in the background (often in their own thread), they save the main working thread(s) a little effort. But they don't always catch everything, and may need to lock memory areas.

    Destructors in a language like C++ run at precisely specified points in the code - they run when the object's lifetime ends. For local objects, that matches the end of their scope. It is absolutely fine to use such synchronous destructors for releasing resources - it is the preferred method, since it is automatic and you are guaranteed that the destructor will run regardless of exceptions. Python does not have such synchronous cleanup for its normal "__del__" destructor methods. But it does have them in the "__exit__" methods of objects that can be used in "with" statements. Thus in Python, you use "with" statements for resources, and do not rely on garbage collection. In C++, where destructors are synchronous, you use RAII ("Resource Acquisition Is Initialisation") for all resources, including memory. (There are techniques available if you want asynchronous resource release.)

    ​The question you need to ask is: what if you didn't catch it? For an arbitrary mutex, would it be better to have a dangling-lock bug that you might not even hit in testing, before the software is in the hands of customers? Or would you rather the lock get freed eventually? I know some people prefer the more catastrophic failure, but that presumes very good test coverage, which often isn't the case.
    Would I rather have one bad and unpredictable bug happen, or a different bad and unpredictable bug? Really, it's a silly question. Obviously I don't want either - and equally obviously, I know that no development process and no programmer is infallible, and there is always a non-zero risk of bugs slipping through in a final product. But I also know that extra code that should never be run, and is totally untestable, is always a liability. If your code correctly frees your locks (or other resources) at the right time, you will never see this garbage collection fallback. You will never know what other effects it has, what bugs it has, how it interacts with other parts of the code. And if your code does not correctly free the locks, what makes you think that freeing them by garbage collection will help? By the time that happens, all sorts of other things could have failed as a result of the lock being unavailable.

    No, the way to handle locks and critical resources is to have clearly defined methods of handling them. If you are programming in Python, you know all lock acquisition must take place within a "with" statement. If you are using Java, use a "try/finally" block. For C++, use RAII. Use the standard, clear and safe methods available for the language in question. Write the code in a clear manner - such as keeping the containing block small so that it is obvious it is correct.

    While debugging, you might make use of hooks in your garbage collection that check for lost locks - but they should not release the lock. They should yell at the developer, with any information they can give to help find the bug.

    ​What about file descriptors, where the program leaks more and more fds, the longer it runs, until operations utilizing file descriptors just randomly start failing? Is that a preferable outcome?

    I think you're being too idealistic.
    I am perhaps idealistic, but I don't think I am too idealistic. I want to write code using sensible techniques that are known to avoid certain classes of errors. I want other people to do that too. I want programs to be written in a responsible manner. Is that really too much to ask?

    Leave a comment:


  • rmoog
    replied
    Originally posted by Developer12 View Post
    I wouldn't expect him to ever include it, but I still laughed at the exclusion of rust.
    On the one hand, it can do any job C can but doesn't allow those mistakes. It even has pointers (of a sort).
    At the same time, GNU will stick to what it has until the project is dead. They'll never re-implement things.
    Nevermind ideological differences like the propensity towards permissive licences in the rust community.
    You do not have permission to view this gallery.
    This gallery has 1 photos.

    Leave a comment:


  • coder
    replied
    Originally posted by DavidBrown View Post
    There's no advantage to garbage collection for that kind of thing, and clear and obvious disadvantages
    If you think it's better to have dangling locks and leak file descriptors (which also means likely unflushed data, as you mentioned) than to safeguard their release via garbage collection, then you're operating under a very different world view than I am. I didn't say there's no benefit to having explicit unlocks or file closures -- just that I'd want garbage collection as a backstop.

    Originally posted by DavidBrown View Post
    That's why with modern garbage collected languages, you generally do not use garbage collection for handling resources other than memory - you use try/finally blocks in Java, "with" statements in Python, "using" statements in C#, etc.
    You're failing to distinguish between what the language actually implements vs. the prescribed best practices. If we take the example of a Python with statement, that simply builds atop the object's existing semantics. It closes the file or releases the lock because that's what the object's destructor does. So, you're wrong to say that Python doesn't use garbage collection for those things, but what it does is provide a better alternative mechanism.

    Originally posted by DavidBrown View Post
    If I saw in a code review that someone had releasing a mutex in garbage collection "as a fallback", I'd reject the code.
    The question you need to ask is: what if you didn't catch it? For an arbitrary mutex, would it be better to have a dangling-lock bug that you might not even hit in testing, before the software is in the hands of customers? Or would you rather the lock get freed eventually? I know some people prefer the more catastrophic failure, but that presumes very good test coverage, which often isn't the case.

    What about file descriptors, where the program leaks more and more fds, the longer it runs, until operations utilizing file descriptors just randomly start failing? Is that a preferable outcome?

    I think you're being too idealistic.

    Leave a comment:


  • DavidBrown
    replied
    Originally posted by coder View Post
    First, that's a little bit of cherry-picking from what you originally said. Your original list was rather open-ended, and now you're just singling out locks. For instance, there are many cases where file handles are merely used to manage a resource, rather than mapping to an actual file that you want to close promptly because someone else might open it. Take, for instance, a poll fd or eventfd.

    Yes, you will typically want to make explicit unlock calls, to reduce latency. I would still want GC to release a mutex as a fallback, in case I forget.

    Third, unlock order doesn't matter the same way that locking order does. If you lock in the wrong order, a deadlock can occur. However, unlocking in the wrong order won't cause a deadlock, as long as all of the locks are eventually released.
    Sure, some resources can certainly be released at some unspecified time later in the future. Basically, if the resource is available in quantity, or is not going to be used again while the program is running (as may well be the case for some files), management by garbage collection is fine. If the resource might be contested, or there might be other use for it, then it is not fine. When considering the correctness of a program, you usually consider that garbage collection never actually happens, or happens far in the future, since that's the worst case. If you are using synchronisation mechanisms between threads or processes, that is simply unacceptable. For memory, it is rarely an issue. For something like files, it might well be - it could be the closure of the file handle that leads to the data being committed to the disk, and users might not be happy to press "save" only to find the file is not saved until some indeterminate time in the future.

    There's no advantage to garbage collection for that kind of thing, and clear and obvious disadvantages (unlike memory, for which garbage collection can be a definite efficiency win as well as being very convenient). That's why with modern garbage collected languages, you generally do not use garbage collection for handling resources other than memory - you use try/finally blocks in Java, "with" statements in Python, "using" statements in C#, etc.

    If I saw in a code review that someone had releasing a mutex in garbage collection "as a fallback", I'd reject the code. There is no way to test that synchronisation directives are correct in your code - it could all work by coincidence each time you test it, and testing is non-deterministic. You have to get it right in the code - and be absolutely sure that it is correct. Such a "fallback" says you are not sure - so go back and re-write or re-structure the code until you are sure, and it's obvious that you are sure because it is obvious that the synchronisations are correct. There are situations where "defensive" programming is good, or where it is useful to "minimise" the damage that might result from bugs - this is not one of them.

    (You are, of course, correct that the order of release of locks rarely matters.)

    Leave a comment:


  • coder
    replied
    Originally posted by DavidBrown View Post
    you want your locks to be taken when you ask for them, in the order you ask for them, and to be released when you ask to release them, in the order you ask (which is almost always the reverse order from acquisition). You don't say "release this lock some time, whenever it suits and you have nothing else to do".
    First, that's a little bit of cherry-picking from what you originally said. Your original list was rather open-ended, and now you're just singling out locks. For instance, there are many cases where file handles are merely used to manage a resource, rather than mapping to an actual file that you want to close promptly because someone else might open it. Take, for instance, a poll fd or eventfd.

    Yes, you will typically want to make explicit unlock calls, to reduce latency. I would still want GC to release a mutex as a fallback, in case I forget.

    Third, unlock order doesn't matter the same way that locking order does. If you lock in the wrong order, a deadlock can occur. However, unlocking in the wrong order won't cause a deadlock, as long as all of the locks are eventually released.

    Leave a comment:


  • DavidBrown
    replied
    Originally posted by coder View Post
    Uh, but the GC languages I know all use objects to manage those other resources, as well. So, garbage collection saves you in those areas, unless you keep an explicit reference to an object, longer than you should.
    You absolutely do not want to manage your synchronisation primitives and locks via asynchronous garbage collection! Your PC has lots of memory - it usually doesn't really matter when it gets returned to the free pool. But you want your locks to be taken when you ask for them, in the order you ask for them, and to be released when you ask to release them, in the order you ask (which is almost always the reverse order from acquisition). You don't say "release this lock some time, whenever it suits and you have nothing else to do".

    There might be an object for managing the lock, and the memory for that can be garbage collected. But not the lock.

    So when you use a garbage collected language like Python, you use a "with" statement to control the lock - you don't rely on garbage collection.

    (Note that C++ style RAII is entirely different. There you have precise semantics about the order and time when objects are destructed, and therefore when the lock gets released.)

    Leave a comment:


  • coder
    replied
    Originally posted by DavidBrown View Post
    I have seen many people who have programmed in C for a long time, who have no clue as to what happens underneath. You can easily write C code that works correctly, without learning how to get efficient results - especially if you ever work with different classes of processor. I have seen people write "x * 0.5" to divide an integer by 2, and then wonder why their program is so slow on an 8-bit microcontroller.
    Well, that's not the fault of using C. You can learn about how CPUs work, and still just use C to program them.

    One of the first tricks I learned to make my MS Quick BASIC programs go faster was to dimension my variables as ints.

    Originally posted by DavidBrown View Post
    Equally, I have seen people who do have some understanding about what is going on underneath, but have no understanding of compilers, write "(x << 2) + x" and claim it is faster than "x * 5". (If that happens to be the fastest way to implement multiply by 5 on the particular target, it's the compiler's job to generate that.)
    Obviously, learning assembly language isn't going to teach you that. What's needed is to learn about optimizing compilers. Or simply compiler optimizations.

    Originally posted by DavidBrown View Post
    A major criticism of ARM is that it doesn't have that many general-purpose registers - only 12 (for 32-bit ARM).
    AArch64 extends the GPRs to 31+ SP/zero.

    Originally posted by DavidBrown View Post
    Keeping track of what data is in registers,
    I would generally to use macros to map variables to registers. I'd define them at the point of "allocation" (i.e. first use) and undefine them at the point of "deallocation" (i.e. last use).

    Originally posted by DavidBrown View Post
    Using intrinsics can have its advantages and disadvantages. You need to be very careful about how the compiler can re-arrange code - this can lead to better pipelining and scheduling, but can also lead to trouble if the programmer expects the resulting assembly to match the source code directly. Getting such code right, efficient, and suitable for different processors in the same family is a fine art!
    I only went as far as making sure the compiler didn't generate significantly more instructions than I expected. Fortunately, I didn't need to worry about whether the code was optimized to within the very last %. Just doing a reasonably efficient vectorization + ensuring decent cache utilization was enough to hit my performance targets.

    Oh, and restricted pointers. Anyone doing code optimization in C or C++ ought to understand them and when to use them. Although C++ doesn't officially support them, every C++ compiler supports them as a nonstandard extension, because they're that important.

    I think the reason the C++ standards committee doesn't like them is that they will break your code, if used improperly. Worse, the breakage is likely dependent on compiler optimization level, which makes usage errors even harder to debug.
    Last edited by coder; 08 September 2022, 10:52 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by DavidBrown View Post
    Leak a semaphore, mutex or a file handle, or other kinds of resource, and you could be in far more trouble. So when learning programming, learn to take care of your resources, and then memory management is peanuts
    Uh, but the GC languages I know all use objects to manage those other resources, as well. So, garbage collection saves you in those areas, unless you keep an explicit reference to an object, longer than you should.

    Leave a comment:


  • coder
    replied
    Originally posted by kylew77 View Post
    throwing assignment like x = 5; and conditions, and basic scanf and printf functions and preprocessor directives all into week 1 is way too much in my opinion!
    I wouldn't touch the preprocessor, other than to say "Put these #include's at the top of your program. You'll understand why, later."

    Originally posted by kylew77 View Post
    What haunts me to this day is we had a woman take the class 3 times and try her hardest and not pass each time and she had to have the class for her engineering degree. She was in mechanical or civil or something like that, someone unlikely to ever need to code in C, but the university insisted that she learn to code in C.
    That's grim. A small part of me does think "gee, if someone has such a hard time grasping these concepts, can they really be such a good engineer?", but I'm sure that's my cognitive bias speaking and I don't know how much stress she was under from other classes or responsibilities. Mathematicians would probably have similar thoughts about some of the areas that give me trouble.

    You'd wish that someone could find additional resources before taking a class the second or 3rd time, but again I don't know her circumstances. If nothing else, there have got to be some good Youtube tutorials about this stuff.

    Originally posted by kylew77 View Post
    This is all ancient history now 2016 and 2017.
    I know some colleges and universities were quick to jump on the Java bandwagon, way back in the late 90's.

    Leave a comment:


  • DavidBrown
    replied
    Originally posted by coder View Post
    Not sure I agree that assembly is necessary for learning "efficient programming". C is good enough, for that.
    I have seen many people who have programmed in C for a long time, who have no clue as to what happens underneath. You can easily write C code that works correctly, without learning how to get efficient results - especially if you ever work with different classes of processor. I have seen people write "x * 0.5" to divide an integer by 2, and then wonder why their program is so slow on an 8-bit microcontroller. Equally, I have seen people who do have some understanding about what is going on underneath, but have no understanding of compilers, write "(x << 2) + x" and claim it is faster than "x * 5". (If that happens to be the fastest way to implement multiply by 5 on the particular target, it's the compiler's job to generate that.)

    Yeah, the main thing assembly language programming teaches you is how much even a C compiler does for you. Register allocation is a huge headache, if you have to do it by hand. Something with lots of general purpose registers, like ARM, should be good for dabbling. That way, you shouldn't have to redo register allocation every time you make a little code change.
    A major criticism of ARM is that it doesn't have that many general-purpose registers - only 12 (for 32-bit ARM). That's enough to play with, but can quickly be limiting in real code. Keeping track of what data is in registers, what is in stack slots, and how they move between them is where compilers shine, and human programmers have a lot more difficulty.

    ​When I was writing SSE code, I used the C intrinsics but I'd check the compiler output, to make sure it wasn't generating lots of extraneous instructions. I did find a few surprises, that way! In the end, I got reasonably close to the efficiency of hand-coded assembly, if not better, and with a lot fewer headaches.
    Using intrinsics can have its advantages and disadvantages. You need to be very careful about how the compiler can re-arrange code - this can lead to better pipelining and scheduling, but can also lead to trouble if the programmer expects the resulting assembly to match the source code directly. Getting such code right, efficient, and suitable for different processors in the same family is a fine art!

    Leave a comment:

Working...
X