Announcement

Collapse
No announcement yet.

Is Assembly Still Relevant To Most Linux Software?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    No problem

    Originally posted by oliver View Post
    Yes, commit those changes Sometimes things get written in a way (sloppy) and overlooked for years. Its always nice having stuff cleaned up.

    However (u)int_fast* usage in the kernel, I'm not sure sure about. If it really is faster/better; it could be a mission to replace it all Unfortunately, right now, (3.7.10) only has ONE reference in the entire kernel and I don't know if it will even stay there, it's been there for a while I believe.

    Code:
    grep int_fast * -R
    drivers/staging/tidspbridge/dynload/cload.c:    uint_fast32_t sum, temp;
    Your concerns are right!
    You normally have to include stdint.h, so it might be not handy. For Kernel-purposes (which is not the case here, because it's Mesa), you could just replace them with char's.
    It boils down to the point that stdint.h just typesets uint_fast8_t to char on x86_64, it is mostly relevant only to other architectures where other 8 bit datatypes might be faster.

    It's always better to know about these types, because they can really make the difference! People should really be encouraged to think about the variable-ranges and use those smaller integers accordingly, instead of just declaring everything an int.
    This approach comes close to Ada, one of the safest languages in existence, where the range of each variable can be strictly limited.
    Last edited by frign; 04-09-2013, 08:57 AM.

    Comment


    • #92
      Originally posted by frign View Post
      It's always better to know about these types, because they can really make the difference! People should really be encouraged to think about the variable-ranges and use those smaller integers accordingly, instead of just declaring everything an int.
      Again, do you have any number to support these claims? (real world, if possible)

      Comment


      • #93
        Originally posted by frign View Post
        Your concerns are right!
        You normally have to include stdint.h, so it might be not handy. For Kernel-purposes (which is not the case here, because it's Mesa), you could just replace them with char's.
        It boils down to the point that stdint.h just typesets uint_fast8_t to char on x86_64, it is mostly relevant only to other architectures where other 8 bit datatypes might be faster.

        It's always better to know about these types, because they can really make the difference! People should really be encouraged to think about the variable-ranges and use those smaller integers accordingly, instead of just declaring everything an int.
        This approach comes close to Ada, one of the safest languages in existence, where the range of each variable can be strictly limited.
        Lets say that my platform has 64bits as its native 'fastest' integer. The compiler should know this (either by an option, --arch=native) or the like thus should be able to optimize anything smaller, up to an int. I guess where we explicitly work with an overflow (bad design?) of, lets say a char:
        Code:
        char i;
        
        for (i = 1; i; ++i) {
        stuff();
        }
        which may be not the greatest code, but it's an example, work with me here. Now in this case, the compiler would have to notice this explicit behavior. But otherwise, it should be able to scale everything up to an uint64. -O3 really should do this automatically, making the whole int_fast moot. It keeps code cleaner and less useless definitions. If -O3 (or -O4 specifically for this then) breaks your program ... well don't depend on stupid design and fix the code

        Comment


        • #94
          You should reconsider that!

          Originally posted by oliver View Post
          Lets say that my platform has 64bits as its native 'fastest' integer. The compiler should know this (either by an option, --arch=native) or the like thus should be able to optimize anything smaller, up to an int. I guess where we explicitly work with an overflow (bad design?) of, lets say a char:
          Code:
          char i;
          
          for (i = 1; i; ++i) {
          stuff();
          }
          which may be not the greatest code, but it's an example, work with me here. Now in this case, the compiler would have to notice this explicit behavior. But otherwise, it should be able to scale everything up to an uint64. -O3 really should do this automatically, making the whole int_fast moot. It keeps code cleaner and less useless definitions. If -O3 (or -O4 specifically for this then) breaks your program ... well don't depend on stupid design and fix the code
          First off, -O4 doesn't even exist.

          Moreover, your example is insufficient, because it is an endless loop, no matter if you overflow it with a char after ~255 cycles or with a 64 bit integer after ~18446744073709551616 ones.
          The program is bloody broken.

          Again, don't try to trick the compiler, because you won't do it right anyway. Try to write code which makes _sense_. If you expect to iterate less than 255 times in the loop, then why not use an 8bit integer for the counting-variable?

          And, more importantly, you don't know if the compiler would really substitute the char with a 64 bit integer.

          Comment


          • #95
            There you go

            Originally posted by erendorn View Post
            Again, do you have any number to support these claims? (real world, if possible)
            There is an excellent article by the embedded-systems specialist Nigel Jones, where he points out the importance of varying integer sizes.

            Apart from that, you can't expect the compiler to know what ranges your variables have. It is your job to know, and in this matter, you should really consider encouraging the compiler to know it and generate even better output.

            A real world example by the same author can be found here, where he employs the fast data-types to optimise a given program. It is definitely a great read!

            Comment


            • #96
              Originally posted by frign View Post
              There is an excellent article by the embedded-systems specialist Nigel Jones, where he points out the importance of varying integer sizes.

              Apart from that, you can't expect the compiler to know what ranges your variables have. It is your job to know, and in this matter, you should really consider encouraging the compiler to know it and generate even better output.

              A real world example by the same author can be found here, where he employs the fast data-types to optimise a given program. It is definitely a great read!
              Indeed, it makes sense for less than 32bit processors (well, you get 25% speed, which is good considering the readability cost).
              The existence of fast_int and least_int is puzzling, though, as I would find it cleaner to let the compiler to choose between fast or small (within the int size constraints) based on what I tell him to optimize for.

              Comment


              • #97
                Originally posted by frign View Post
                First off, -O4 doesn't even exist.
                I never said it would, I only hinted that if it doesn't do it now, it should and if required only in 'O4'
                Moreover, your example is insufficient, because it is an endless loop, no matter if you overflow it with a char after ~255 cycles or with a 64 bit integer after ~18446744073709551616 ones.
                How is it an endless loop? After 255 comes 0 no? So when then i is no longer true and the loop aborts? Quite a common trick on FPGA's where you define your variable in exact bits and count until it reaches 0 again, since on FPGA's you don't have 'int' but only bit wide variables. Granted, having to count to 256, or the likes is very uncommon, I hope, but it would be a situation of concern.
                The program is bloody broken.

                Again, don't try to trick the compiler, because you won't do it right anyway. Try to write code which makes _sense_. If you expect to iterate less than 255 times in the loop, then why not use an 8bit integer for the counting-variable?

                And, more importantly, you don't know if the compiler would really substitute the char with a 64 bit integer.
                Who is tricking the compiler? It is a valid expression.

                Anyway, If you 'know' you want to do 15 to 20 iterations in a loop, and the compiler can't ever know what the max amount is, you put it in a char (8bits). All normal and sensible. Now your compiler also knows your arch uses 64bits natively and those are the fastest for it to handle. A smart compiler would use an uint64 anyway, because a) it fits, b) it's faster. Yes it uses more memory (but it's always either higher memory usage, or faster execution; can't have it both ways normally).

                So again, using uint8; uint8_least; uint8_fast; shouldn't make any difference whatsoever and really is kinda silly to worry about. The compiler knows best. And I don't think in todays code, you have 90% of the ints, that need to be as small as possible to save on memory footprint, but 10% need to be the fast kind, so having explicit control over them is pointless. gcc test.c -o test --fastint or --leastfit; should be the only tunable.

                Comment


                • #98
                  Not really

                  Originally posted by erendorn View Post
                  Indeed, it makes sense for less than 32bit processors (well, you get 25% speed, which is good considering the readability cost).
                  The existence of fast_int and least_int is puzzling, though, as I would find it cleaner to let the compiler to choose between fast or small (within the int size constraints) based on what I tell him to optimize for.
                  Again, you can't expect the compiler to do that, because C/C++ doesn't have the respective boundary-mechanisms to know which range an integer has.
                  If you for instance handled stdin for a color-processing program, you would know that the RGB-values would be stored in an unsigned 8bit-Integer, because the values or R,G and B respectively do not go beyond 255.
                  The example Nigel Jones gave was of course based on an optimising compiler. If the compiler was that smart to actually know about the integer-size, how come he does get different results?

                  I can not say this often enough: Don't put too much trust into the compiler, do more on your side to make clear what you want without focusing on the compilers quirks in this sick extent.

                  Comment


                  • #99
                    Originally posted by frign View Post
                    I can not say this often enough: Don't put too much trust into the compiler, do more on your side to make clear what you want without focusing on the compilers quirks in this sick extent.
                    We already do that. u8; u16; u32 and u64 (u128 at some point i'm sure). Let the compiler then decide, if < u64 should be scaled up to a u64 to be faster. IF the compiler can 'combined' to u32's somehow to get a u64 (assuming it will be all very valid), then he could scale up two u16's to u32's for this example.

                    Yes, absolutely you should help the compiler a little, but the compiler should also be smart enough to do certain things smart (if allowed to do so).

                    Comment


                    • You misunderstood it

                      Originally posted by oliver View Post
                      We already do that. u8; u16; u32 and u64 (u128 at some point i'm sure). Let the compiler then decide, if < u64 should be scaled up to a u64 to be faster. IF the compiler can 'combined' to u32's somehow to get a u64 (assuming it will be all very valid), then he could scale up two u16's to u32's for this example.

                      Yes, absolutely you should help the compiler a little, but the compiler should also be smart enough to do certain things smart (if allowed to do so).
                      I think there is a strong misconception on your side here: Only because an integer has the same size as the address-scope of the currently employed Operating System (namely, 64 bit), it does not mean these datatypes are faster than smaller ones (8, 16, 32).
                      It is the other way around: The smaller the datatypes, the less ressources are needed to handle them. There are exceptions for 16 bit integers on some old architectures, but we do overall have a consistent speedup in all cases where the integer-size has been limited.
                      Even in the case of the slower 16 bit integers, you can employ the fast integer-types and benefit from well-thought-out typesets for specific architectures, ruling out potential slowdowns.

                      The compiler is a smart guy, but he is not a magician: He would never risk anything and he is no artificial intelligence. There may be constant improvements in this sector, but limiting the integer-size requires you to know that the integer would _never_ overflow.
                      How would a compiler predict that when he has to optimise a stdin-parser?

                      PS: I don't think we will see 128 bit soon, because 64 bit-addresses can span virtual memory with the maximum size of 16 exbibyte, which is 1 trillion gibibytes.
                      But I may only sound like Bill Gates having allegedly stated this in 1981:
                      640K ought to be enough for anybody.
                      Last edited by frign; 04-09-2013, 01:17 PM.

                      Comment


                      • 128-bit variables are already supported via compiler extensions. Not 128-bit pointers though.

                        Comment


                        • Originally posted by frign View Post
                          I think there is a strong misconception on your side here: Only because an integer has the same size as the address-scope of the currently employed Operating System (namely, 64 bit), it does not mean these datatypes are faster than smaller ones (8, 16, 32).
                          It is the other way around: The smaller the datatypes, the less ressources are needed to handle them. There are exceptions for 16 bit integers on some old architectures, but we do overall have a consistent speedup in all cases where the integer-size has been limited.
                          I guess this requires some hard evidence to compare various CPU's and architectures and see what happens. Write some code for 8bit AVR (still relevant today), 16bit MSP-430 (also still relevant today); 32bit 'old' CPU (pentium-m comes to mind) and modern 64bit CPU in both 32bit and 64bit modes and see what happens.

                          I'm not convinced that using anything other then the native width is more efficient (FOR CALCULATION!). Yes there is a cost/benefit analysis in other regards that are also very important, something the compile can't even know about. The data has to move in and out from the CPU over a bus. If the buss is narrower then the native size of the CPU, some magic will need to happen. If it stays internal to the CPU (registers) then the native size is most efficient in any case.

                          Using bigger int's however has the disadvantage it uses more memory, and that can potentially slow things down again (more data to copy into memory etc).
                          Even in the case of the slower 16 bit integers, you can employ the fast integer-types and benefit from well-thought-out typesets for specific architectures, ruling out potential slowdowns.

                          The compiler is a smart guy, but he is not a magician: He would never risk anything and he is no artificial intelligence. There may be constant improvements in this sector, but limiting the integer-size requires you to know that the integer would _never_ overflow.
                          How would a compiler predict that when he has to optimise a stdin-parser?
                          I do fully agree with you here and we should use u8, u16 etc as fit. And yes, the PROGRAMMER needs to know that his data fits into a certain variable. I never said the compiler should check for 'valid overflowing' it's a dumb technique but it does get used. Anyway I only wanted to say, the compiler can be made smart enough, to automatically replace u8 (or uint8, or whatever) with whatever it sees fit to be the fastest/least memory requirement (based on a switch if you wish). I as a developer know it will fit into an u8. The compiler can optimize it to whatever it wants.

                          PS: I don't think we will see 128 bit soon, because 64 bit-addresses can span virtual memory with the maximum size of 16 exbibyte, which is 1 trillion gibibytes.
                          But I may only sound like Bill Gates having allegedly stated this in 1981:
                          We will see 128 bit data types soon.. You know why? It already exists. First result for uint128 on google? http://msdn.microsoft.com/en-us/library/cc230384.aspx They even use a valid example. Store a ipv6 address. Yes you could store it in a struct, but that's besides the point. Actually, a 128bit int could potentially be more optimized so when using it in routers I can see it making sense. (I know it's a definition to two 64bits for now, but the datatype is there).

                          But there's actually more valid usage for it, today. http://lxr.free-electrons.com/source.../b128ops.h#L54
                          Yes it's in the kernel source already. Right now the only sensible thing is to use it in cryptographic routines. And it makes sense. If your encryption routine makes use of 128bit values, why do magic with structs and unions to work around a shortcoming?

                          Comment


                          • To clear it up once and for all!

                            Originally posted by oliver View Post
                            I guess this requires some hard evidence to compare various CPU's and architectures and see what happens. Write some code for 8bit AVR (still relevant today), 16bit MSP-430 (also still relevant today); 32bit 'old' CPU (pentium-m comes to mind) and modern 64bit CPU in both 32bit and 64bit modes and see what happens.

                            I'm not convinced that using anything other then the native width is more efficient (FOR CALCULATION!). Yes there is a cost/benefit analysis in other regards that are also very important, something the compile can't even know about. The data has to move in and out from the CPU over a bus. If the buss is narrower then the native size of the CPU, some magic will need to happen. If it stays internal to the CPU (registers) then the native size is most efficient in any case.

                            Using bigger int's however has the disadvantage it uses more memory, and that can potentially slow things down again (more data to copy into memory etc).

                            I do fully agree with you here and we should use u8, u16 etc as fit. And yes, the PROGRAMMER needs to know that his data fits into a certain variable. I never said the compiler should check for 'valid overflowing' it's a dumb technique but it does get used. Anyway I only wanted to say, the compiler can be made smart enough, to automatically replace u8 (or uint8, or whatever) with whatever it sees fit to be the fastest/least memory requirement (based on a switch if you wish). I as a developer know it will fit into an u8. The compiler can optimize it to whatever it wants.


                            We will see 128 bit data types soon.. You know why? It already exists. First result for uint128 on google? http://msdn.microsoft.com/en-us/library/cc230384.aspx They even use a valid example. Store a ipv6 address. Yes you could store it in a struct, but that's besides the point. Actually, a 128bit int could potentially be more optimized so when using it in routers I can see it making sense. (I know it's a definition to two 64bits for now, but the datatype is there).

                            But there's actually more valid usage for it, today. http://lxr.free-electrons.com/source.../b128ops.h#L54
                            Yes it's in the kernel source already. Right now the only sensible thing is to use it in cryptographic routines. And it makes sense. If your encryption routine makes use of 128bit values, why do magic with structs and unions to work around a shortcoming?
                            I don't know if I can take you serious any more. You still might not get the concept I think.

                            Of course there are native 128 bit variables, why shouldn't there be any? There are also native 128 bit variables available on 32 bit processors! Gosh!

                            So, how does this work you might ask. I give you the answer, so you don't tell this BS to anyone else in the future:

                            Let's take the normal x86-architecture, because it is simple to explain:
                            There are specific registers to store specific integers of specified size. I'll list them for your convenience:
                            1. 8 bit --> Registers AL, AH, ...
                            2. 16 bit --> Registers AX, ...
                            3. 32 bit --> Registers EAX, ...
                            4. 64 bit --> Registers MM0, ...
                            5. 128 bit --> Registers XMM0, ...

                            As you can see, both big and small integers are natively integrated into the CPU, even though we can only have 32 bit addresses. Using large integers eats up more RAM, granted, but it is not faster to use them in any way. I put it this way: Storing each integer as a 64 bit integer on 64 bit systems doesn't bring you benefits and there are specific registers for all sizes!
                            It brings benefits to use smaller integers, because they are native to the CPU!
                            Even better, 8 bit AVR and 16 bit MSP-430 doesn't mean we peak at 8 bits. I honestly have never worked with these processors, but I am sure they do support 8 bit integers. Scaling those up to greater lengths depends on the architecture, but you are not forcibly locked to the specified maximum address space.

                            Also, using the C99 integer-types will bring you the flexibility you expected: there are ways to implement integers of _at least_ n bit size (int_leastn_t), of fastest least n bit size (int_fastn_t) or as of a standard n bit size (int8_t). Stdint.h already handles that for you and you don't even need special compilers to optimise it in this regards.

                            In this regards, storing certain datatypes in those native registers is not a big hurdle! Need to store an unsigned integer of at least 128 bit size? No problem, just use uint_least128_t and you are fine.

                            You are pointing out issues which do not exist! Integer sizes do not magically scale up or down depending on which architecture you are; it depends on the specific implementation of the architecture you are using.

                            I hope this was clear enough already!

                            Comment


                            • Yes

                              Originally posted by curaga View Post
                              128-bit variables are already supported via compiler extensions. Not 128-bit pointers though.
                              128 bit variables: They are even supported by the processor itself, so you don't need compiler extensions. You just need the to address the right register and store the variable in it; if your compiler doesn't already implement this behaviour properly and requires extensions, I would recommend you to switch to another which does!

                              128 bit pointers: Don't make sense. Why?
                              Originally posted by My-bloody-self!
                              If you wrote a book with 100 pages, how much sense would it make to set up a table of contents for 1000 pages?
                              I hope you get my point: If your processor can only handle address-lengths of 64 bits size, 128 bit addresses (--> 128 bit pointers) don't factually bloody make sense.

                              Comment


                              • 8/16bit calculations are faster then 32/64, ofc
                                they can be loaded faster, easier aligned and even packed when needed

                                for example you can load 64bit from memory and treat it as 8bit
                                then shift the whole register and treat as the next 8bit number
                                and so in all 8 times

                                also worth noting is, what was mentioned i think, that you need to take care how you structure your struct
                                as the compiler will align the values in it to 8/16bytes (maybe even 64byte cachelines)
                                thus having.. well holes

                                sizes of each data types can be dependent not only on the compiler but also on the OS and the alignment of alpha centauri stars with their planet
                                so sizeof is good to make sure

                                Originally posted by ciplogic View Post
                                ...
                                damn, now i have to write a function

                                xmms shuffles are like sudoku so il write one tomorrow when me head clears up

                                and ye, i was thinking of just the raw brute force multiplication as it is bit hard for a compiler since shuffling it to fit nicely needs a bit of planing
                                like just a function compute(pointer, pointer, how_many)

                                also interpreting the title "Is Assembly Still Relevant To Most Linux Software?" is, to be honest, not that easy
                                if you count the 1% gained (guessing) from assembly in shared libraries, then it is relevant
                                if you count things written directly in a program, then probably not that much (overall) except in the kernel and such low level things

                                and again, assembly is not really to be used when not needed
                                and its not as hard as everybody says
                                Last edited by gens; 04-09-2013, 02:25 PM.

                                Comment

                                Working...
                                X