Announcement

Collapse
No announcement yet.

C++20 Being Wrapped Up, C++23 In Planning

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by plonoma View Post

    So Rust and Dlang do not have technical deficits like the other languages have, interesting.
    What do you think of Rust's borrow checker?
    Your argument technique is putting words in peoples' mouths.

    If you have something to say on your own account, say it.

    Comment


    • #52
      Originally posted by plonoma View Post

      But that's a very roundabout way of doing things.
      Surely being able to do the operation on a signed numeric variable in one step would be faster while providing more readable and elegant code.
      If you need wrapping signed overflow, casting to unsigned types, performing the operation, and casting back will give you exactly the same results (except for division with negative numbers). The compiler will use basically the same instructions, and it will be just as efficient. You lose other optimisation opportunities, as you always do with wrapping overflow instead of undefined overflow. Sure, it is a little cumbersome - but that is a good thing for an operation that is almost certainly not what you should be using. Wrapping on signed overflow is almost invariably /wrong/. It is a mistake in your logic in all but a few specific use-cases.

      Comment


      • #53
        Originally posted by carewolf View Post

        No, doing it in unsigned is actually faster. C casting does no computation, but only controls how instructions are selected. So casting+operating+casting back is still one instruction, but now one that doesn't set CPU-flags, so one with less hazards so potentially slightly faster. Even if both signed and unsigned addition are 1 clock instructions, the unsigned one is easier for the CPU to perform out of order because it has fewer side effects.
        That's not really true. Processors vary as to whether they have flags or not, and whether instructions use the flags or not. (In the RISC world, it's common to have two versions of instructions - one that updates the flags, and one that does not.) But when updating the flags, signed and unsigned operations are the same (except for division). And the compiler will not pick a flag-updating version unless it has use for the flags (for extended arithmetic, or for conditional branches just afterwards). Casting your signed ints to unsigned, doing the arithmetic, then casting back, is free in terms of the instructions used. Like any unsigned arithmetic, you lose a few optimisation opportunities compared to pure signed arithmetic.

        (In general, C-style casting /can/ result in work at runtime. It is approximately equivalent to C++ static_cast, not reinterpret_cast. But in the specific case of integers, it involves no work.)

        But yes, it looks like shit in code.
        I think that can be a good thing sometimes - when you are doing dodgy things in code, they should stick out. But if you need this kind of thing a lot, and want nice looking code, you just put it in a class.

        Comment


        • #54
          Originally posted by plonoma View Post
          Interesting point you have there. How should C++ have properly exposed the diversity of hardware behaviors?
          Please do think about it for a while and proofread. Don't be afraid to write too much.
          Myself as well as other readers might be very interested in what you come up with.
          When dealing with hardware variability, programming language designers have several tools at their disposal.

          If there's not too much variability, the language can just bake in support for every supported hardware feature set, with some kind of conditional compilation mechanism to help portable programs adapt themselves to the target hardware. This is how most programming languages handle 16-bit vs 32-bit vs 64-bit CPUs: the language specifies integer types of all these width, but not all of them may be available, and there is usually a mechanism available by which your program can tell at compile time what is the local supported integer set and adapt its behavior accordingly. When possible, that's the best scenario IMO.

          When the variability gets worse (e.g. each hardware basically does its own thing and is bad at emulating what other hardware does) but the behavior of any given (OS, hardware, compiler) combination remains predictable, programming languages may instead adopt what in C terminology is called implementation-defined behavior. Basically, the language does not specify what happens, but compiler authors must state what will happen in their documentation. For an example of this strategy at work, you can have a look at the reference manual of the GNAT Ada compiler: https://gcc.gnu.org/onlinedocs/gnat_rm/ .

          A related concept in the C familiy is that of unspecified behavior, where the standard documents what may happen, but not what will happen, and does not require implementations to clarify it either. However, an implementation should behave consistently, e.g. if a program is run twice it should produce the same result. This is function argument evaluation order in C and non-short-circuit boolean operators in many languages, for example.

          And finally, we have the infamous undefined behavior, where anything can happen and what happens can change every time the program is compiled or run, without changing any other parameter like compiler version.



          In my opinion, as far as programming language design goes, the C designers used the worst-case undefined behavior sledgehammer way too much, and signed integer overflow being undefined is a great example of that because it's something that's very well defined at the hardware and OS level, and compilers and C-familiy languages are the only ones conspiring to make it a mess.

          One might counter that there is actually a tiny amount of run-time variability in hardware behavior, in the form of trapping overflow on some obscure hardware architectures. To which I would reply that this can be handled just like the C family handles trapping unsigned integer overflow or IEEE-754 traps, namely by saying that the programming language does not support this obscure hardware feature and that programmers who want to use it must resort to optional language features or compiler-specific language extensions. This compromise, while crude at first sight, has worked out very well for every other arithmetic type out there in C, and there is no good reason why signed integers should be handled differently in this respect.

          Another classic counter is that undefined signed integer overflow enables better performance through elision of code dedicated to handling overflow. Ignoring the intellectual dishonesty of taking a hardware-motivated language design decision and pretending that it was performance-motivated, I will reply to this claim that I have never seen to date a case where switching integers types from signed to unsigned made a measurable difference to the performance of a useful C/++ program, as opposed to a one of those toy code snippets used by compiler authors to brag about all the things that their optimizer can do. On the other hand, I have seen colleagues losing hours tracking down a bug emerging from undefined signed integer overflow. From this personal and admittedly limited perspective, the tradeoff does not look good to me.

          Therefore, for all these reasons, I think that signed integer overflow should have been at least unspecified behavior, and preferably implementation-defined behavior. This would have given C/++ compiler authors all the headroom they need to implement signed integer overflow in the fashion that the target hardware supports best, without burdening C/++ programmers with one more piece of UB cognitive burden that they need to keep in mind at all time while writing code.

          And if you have good ideas maybe it's a good idea to talk about them at the appropriate place. As another poster told me in a reply: join the C++ standardization process and participate to make sure your issues/ideas receive the appropriate consideration. More details on participation can be found at: https://isocpp.org/std
          I must respectfully decline. I have been dealing with that sort of standard commitee before, and it's a huge time sink which cannot be sustainably handled on volunteer time. Further, I think that it's too late to fix C and C++ via backwards compatible adjustments, and also too late for significant backwards-incompatible changes to be accepted by the community (given how backwards-compatible additions are already not so well accepted at all).

          In my opinion, the most effective thing to do at this stage would be to put C and C++ in maintenance mode (so that at least practicioner knowledge about them can be stabilized, unlike the current C++ situation where the spread of the knowledge distribution and coding styles of programmers is constantly stretching), save these languages for the maintenance of healthy legacy software systems, and dedicate all new feature development effort to backward-incompatible redesigns that try to learn from C's mistakes in order to allow new applications to be written more easily, such as Zig or Rust.

          But I wouldn't expect to be able to convince the ISO C++ standardization group to be ever convinced of that, since that would be perceived as a form of defeat by them.

          Comment


          • #55
            Originally posted by DavidBrown View Post

            (In general, C-style casting /can/ result in work at runtime. It is approximately equivalent to C++ static_cast, not reinterpret_cast. But in the specific case of integers, it involves no work.)
            It can, but in most safe cases doesn't. But since you brought up classic RISC architectures, even conversions between integers there can require work to mask and sign extend there due to fewer specialized instructions.

            Comment


            • #56
              Originally posted by carewolf View Post
              It can, but in most safe cases doesn't. But since you brought up classic RISC architectures, even conversions between integers there can require work to mask and sign extend there due to fewer specialized instructions.
              Code:
              float foo(float x) {
                  return (int) x;
              }
              The cast (explicit conversion) performs a double to integer conversion at run-time. The implicit conversion to the double is another run-time instruction. This is very different from reinterpreting the bits directly.

              Conversion between integers on RISC architectures does not involve sign extension or masking. Loading sub-word sizes often does - and you generally have a distinction between "load byte / half-word with sign extension" and "load byte /half-word with zero extension". But that is not conversion, it is loading from memory, and it applies whether you are using signed or unsigned data. The same applies to conversion from bigger sized integers to smaller sized integers in registers - it is about the sizes, not the signedness (though the signedness determines the choice of zero or sign extension).

              Comment


              • #57
                Originally posted by HadrienG View Post

                And finally, we have the infamous undefined behavior, where anything can happen and what happens can change every time the program is compiled or run, without changing any other parameter like compiler version.



                In my opinion, as far as programming language design goes, the C designers used the worst-case undefined behavior sledgehammer way too much, and signed integer overflow being undefined is a great example of that because it's something that's very well defined at the hardware and OS level, and compilers and C-familiy languages are the only ones conspiring to make it a mess.

                One might counter that there is actually a tiny amount of run-time variability in hardware behavior, in the form of trapping overflow on some obscure hardware architectures. To which I would reply that this can be handled just like the C family handles trapping unsigned integer overflow or IEEE-754 traps, namely by saying that the programming language does not support this obscure hardware feature and that programmers who want to use it must resort to optional language features or compiler-specific language extensions. This compromise, while crude at first sight, has worked out very well for every other arithmetic type out there in C, and there is no good reason why signed integers should be handled differently in this respect.

                Another classic counter is that undefined signed integer overflow enables better performance through elision of code dedicated to handling overflow. Ignoring the intellectual dishonesty of taking a hardware-motivated language design decision and pretending that it was performance-motivated, I will reply to this claim that I have never seen to date a case where switching integers types from signed to unsigned made a measurable difference to the performance of a useful C/++ program, as opposed to a one of those toy code snippets used by compiler authors to brag about all the things that their optimizer can do. On the other hand, I have seen colleagues losing hours tracking down a bug emerging from undefined signed integer overflow. From this personal and admittedly limited perspective, the tradeoff does not look good to me.

                Therefore, for all these reasons, I think that signed integer overflow should have been at least unspecified behavior, and preferably implementation-defined behavior. This would have given C/++ compiler authors all the headroom they need to implement signed integer overflow in the fashion that the target hardware supports best, without burdening C/++ programmers with one more piece of UB cognitive burden that they need to keep in mind at all time while writing code.
                You are making the two classic mistakes about signed integer overflow here. First, you think C left it undefined in order to support odd hardware. Second, you think defining it would be better than leaving it undefined. And though you haven't written it, I'm guessing you also make the third one of thinking that two's complement is the "natural" or "obvious" representation of signed integers, and that wrapping is the "natural" or "obvious" overflow behaviour. (If that assumption is wrong, then I apologise.)

                (I'm referring to C here, but C++ inherits the same behaviour.)

                Let's consider what the original C designers had to think about regarding signed integers. They had to support different formats - two's complement without padding was common, but not universal at the time. Different hardware had different ways of handling overflow. But was that why they picked "undefined behaviour" as the result of signed integer overflow? No, it was not. C supports a wide range of hardware. Where different hardware has different effects, and it is useful to know the effects and use them, C gives them "implementation defined behaviour". That means the compiler must document what it does in these cases, and be consistent about it. Conversion of an unsigned value to a signed type is implementation defined (if the signed type cannot represent the value directly). If the C designers considered signed overflow to be a useful feature which might be hard to implement consistently between different machines it too would be implementation defined. Instead, the language designers realised that overflowing signed arithmetic is simply wrong - it doesn't make sense. There is no right answer, so there is no definition of it in C.

                You must remember here that C is not primarily defined as a way to generate code for a processor. It is not defined in terms of the underlying CPU or hardware. It is an abstract language, defined in terms of an abstract machine. It is (contrary to popular misunderstandings) a high-level language, not a low-level language or a "universal assembler". But it is defined in a way that makes it efficient to implement, so that people can use it instead of assembler or other low-level languages. So the designers understood that if you have two integers, and you add them, you want the result to be the mathematically correct result. If the language can't give you the correct result, it can't help - any answer would be wrong, so there is no point in giving you one.

                The reason most hardware uses two's complement signed integer is not because it makes particular sense as a way of storing signed data. It is simply the easiest and cheapest method in hardware. The reason signed overflow is wrapping in hardware is not because it is useful (except in a few specific cases), but because it means the same hardware and same instructions can be used for signed and unsigned arithmetic, for multi-word arithmetic, and for both addition and subtraction.

                Like many people, you want signed integer overflow to be defined behaviour. But I suspect that like most who want this, you haven't actually thought about /why/ you want it to be defined, and the consequences of defining it. Tell me, when would you want to have overflow give a specific value? Under what circumstances would it make sense to add 2 billion to 2 billion and get minus 300 million? It makes no sense. It is almost never helpful - it is almost invariably a mistake. If your signed arithmetic overflows, you are going to get nonsense results - unless you have written code specifically expecting this, it's nasal demons however the language defines it. Alternative handling of overflow, such as saturation, throwing a C++ exception, trapping, setting errno, etc., would likely be much more useful - but significantly more costly in run-time performance. When a language defines overflow as wrapping, such as Java does, it loses these options. When it is undefined, like in C, tools can change that behaviour - you can add run-time checks in the tool to find bugs. By leaving signed overflow undefined, the developer has better tools to find and eliminate bugs in the code. To me, that is the important point - optimisation of code based on the knowledge that undefined behaviour does not occur is just a bonus.

                The idea of making signed integer arithmetic overflow defined as two's wrapping (or at least as implementation defined) comes up in preparation for every new version of the C and C++ standards. Every time, there are people who want to define the behaviour because they think it will make programmers' lives easier or eliminate bugs. Every time, the proposals are rejected because it would make programmers' lives harder and make it harder to spot bugs (as well as making code less efficient).

                Comment


                • #58
                  Originally posted by DavidBrown View Post
                  The reason most hardware uses two's complement signed integer is not because it makes particular sense as a way of storing signed data. It is simply the easiest and cheapest method in hardware.
                  Actually two's complement makes the most sense of storing signed data, mathematically, which is also the reason you can use the same operations for both signed and unsigned. It's simply a finite (truncated) 2-adic number. Since you don't have unlimited bits.

                  The fact that it's also the cheapest to implement in hardware is a consequence of the perfect mathematical relationships.

                  Comment


                  • #59
                    Originally posted by Weasel View Post
                    Actually two's complement makes the most sense of storing signed data, mathematically, which is also the reason you can use the same operations for both signed and unsigned. It's simply a finite (truncated) 2-adic number. Since you don't have unlimited bits.

                    The fact that it's also the cheapest to implement in hardware is a consequence of the perfect mathematical relationships.
                    No, that is actually a circular argument. Two's complement does /not/ make the most sense mathematically. It is certainly one simple and consistent way to implement signed integers, but no method is mathematically better or worse than others - the mathematics of integers is totally unrelated to how they are encoded in bits in electronics.

                    Alternative choices that have been used are ones' complement (where -x is ~x rather than ~x + 1), sign-magnitude (the MSB is a sign bit), and offset (where you store x + M as an unsigned number, with M being mid point in your range). These each have their advantages and disadvantages. (Standard floating point formats use sign-magnitude to hold the mantissa, and offset format to hold the exponent.)

                    Two's complement is simple and fast to implement for many common operations, and avoids the complication of "negative zero", which is why it is the most common format today. But it does not have any special mathematical benefit.

                    Comment


                    • #60
                      Originally posted by DavidBrown View Post
                      No, that is actually a circular argument. Two's complement does /not/ make the most sense mathematically. It is certainly one simple and consistent way to implement signed integers, but no method is mathematically better or worse than others - the mathematics of integers is totally unrelated to how they are encoded in bits in electronics.
                      You obviously have no idea what you're talking about. I suggest you read up on 2-adic numbers (or p-adic in general).

                      It has nothing to do with encoding here. Two's complement is indeed encoded the same as 2-adic numbers on a finite amount of bits (math is infinite, of course), and the fact that it just works without any special cases for arithmetic is due to the underlying mathematics and not due to special encoding hackery. It is the most elegant mathematical representation without having to resort to "sign".

                      Multiplication and Division have different instructions for signed and unsigned only because of the finite bits to store them. For multiplication it only matters when you need the upper half of the result (i.e. 16x16 multiply -> 32 bits, upper 16).

                      Originally posted by DavidBrown View Post
                      Alternative choices that have been used are ones' complement (where -x is ~x rather than ~x + 1), sign-magnitude (the MSB is a sign bit), and offset (where you store x + M as an unsigned number, with M being mid point in your range). These each have their advantages and disadvantages. (Standard floating point formats use sign-magnitude to hold the mantissa, and offset format to hold the exponent.)
                      None of those have any mathematical representation, they're just arbitrary. Especially the MSB sign bit thing.

                      Comment

                      Working...
                      X