Announcement

Collapse
No announcement yet.

Is Assembly Still Relevant To Most Linux Software?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by frign View Post
    Thanks for this insufficient example!

    First off, your C-code stinks. Not only can't you judge efficiency by line-numbers, you also wouldn't ever construct a for-loop this way.
    Here's the correction for you and all the others not yet having understood how to efficiently construct loops, by the means of actually _allowing_ the compilers to optimise it properly:

    Code:
    for(i=66; i; --i){
            stuff;
    };
    1. Code Formatting: Never brag with one-liners when you can't read them once they get more complex.
    2. Count down!: (If possible), it will be much easier for the compiler, because it doesn't need to check an unary-condition and can fire off a jump if zero (JZ in x86) where needed. You would never know that if you didn't learn ASM some day
    3. Pre-Decrementing: I hope you know what that is, because there is good reason to do so: The compiler has no way to efficiently place the Post-Decrement in this loop, whereas it is really simple for him to do this with a Pre-Decrement.
    4. Semi-Colons: Quite small point, but it serves the readability to put a semi-colon at the end of a for-loop.
    5. I'm open for additions...
    at least gcc has the same result for both loops (incrementing / decrementing) if you enable optimizations (-O2). It produces a decrement loop with jne.

    Comment


    • #72
      Proper C

      Originally posted by gens View Post
      im open to critic
      but that example wasnt an example of how to do things in C, it was just the simplest one liner that came to mind

      ofc normally i write it structured like

      Code:
      for( i=0; i<66; i++) {
            things
            }
      this looks readable in longer code to me, and i didnt have one teacher to teach me a specific coding stile
      as i dont do it for money i dont care rly

      and it gets compiled like that shorter example in asm
      thing is C was made for humans too, even thou it was made so programers dont have to write assembly
      so C maps well to assembly but also to human logic and compiler knows you just want to make that loop run 66 times

      but what C dosent tell you is how many registers that cpu has
      it comes natural to use as many variables in a loop as needed to reduce calls and double calculations
      but if you use more variables then you have registers then the compiler has to store them to ram and read back from cpu cache when needed

      thats one case where you dont know what your doing cuz you never learned and the compiler didnt tell you
      probably wont help much in performance thou as it gets loaded quite fast from the cache
      Yes, I agree in most points. There is no definite coding style, but I guess efficiency is an ideal everyone should work on.

      In regards to Loop- and In/Decrement-efficiency, there is a great paper about it by the folks from IAR here.

      Even though the bottlenecks of inefficient code might seemingly be negligible, it is a factor to be still considered, because small issues can sum up into bigger ones once you scale up. And if you learn how to do it properly, you won't even be slower doing it the right way!
      Compilers are a great way of dealing with many architectures, so considering the same optimisation-paradigms work under many architectures, proper C doesn't even require you to know much about many architectures, but it can ultimately help you _know_ what is proper C in the first place.

      Best regards

      FRIGN

      Comment


      • #73
        Use your brain

        Originally posted by droste View Post
        at least gcc has the same result for both loops (incrementing / decrementing) if you enable optimizations (-O2). It produces a decrement loop with jne.
        In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.

        Comment


        • #74
          Originally posted by frign View Post
          In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
          In fact, the initial trivial example was really serving the guy really badly,not because the compiler will write this code, but also do_stuff() method or code doesn't make any warranties, like for example that RCX is preserved. So if the function do_stuff() changes RCX and sets it to 10, his trivial optimization would break the code and never finishes.

          Also, as it would be assembly, his optimization would exclude the possibility that the compiler can inline do_stuff(); method (if is a method).

          If is inlined/or stuff is code, the compiler will do other things, like will see things that don't depend on i variable and they will be moved outside of the loop (Loop-Invariant-Code-Motion optimization). After that it may find that the loop has a formula that can be computed at compile time as it is a constant expression and it will take 0 ms (no CPU type) as the compiler will be able to compute the expression. Yet sometimes, as the value to iterate is bigger than 66, the compiler can take the decision to split the loop in sequences of 4 like this:
          Code:
          for(auto i=0; i<63; )
          {
           stuff(); i++;
           stuff(); i++;
           stuff(); i++;
           stuff(); i++;
          }
          while(i<66)
          {
            stuff();
            i++;
          }
          and maybe the 4 stuff can be auto-vectorized or even is not, it removes 3 branches (which are costly even in out-of-order CPUs).

          What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).
          Last edited by ciplogic; 07 April 2013, 02:23 PM.

          Comment


          • #75
            Originally posted by frign View Post
            In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
            i just tested the for loop
            the compiler didnt optimize it, so ye its better to go --
            makes me wonder now why as it should be the same logic
            hmm


            anyway every language has a lot to learn
            in asm its instruction scheduling and things like that
            in C it is to be good to the compiler
            theres hundreds of pages of text for both topics
            i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me

            Comment


            • #76
              Originally posted by ciplogic View Post
              and maybe the 4 stuff can be auto-vectorized or even is not, it removes 3 branches (which are costly even in out-of-order CPUs).

              What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).
              auto vectorizing is natural in asm

              another thing a compiler cant do is what it cant know
              like when you know a loop will only execute 3-7 times
              a compiler dosent know that so it will put out a speed optimized version, one that is bloated for what it does

              also i never said you have to write whole programs in assembly too get performance
              on the contrary i said you are best to write only few tightest loops in assembly

              also can you show me a program more optimized then x264 or glibc ?
              benchmark the pure C version of musl against glibc then you can say for sure how good a compiler is
              musl from what i see is good, optimized, C so perfect for benchmarks

              Comment


              • #77
                I agree

                Originally posted by gens View Post
                i just tested the for loop
                the compiler didnt optimize it, so ye its better to go --
                makes me wonder now why as it should be the same logic
                hmm


                anyway every language has a lot to learn
                in asm its instruction scheduling and things like that
                in C it is to be good to the compiler
                theres hundreds of pages of text for both topics
                i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me
                Agreed, even though I am a beginner when it comes to ASM, judging from the experiences I fetched so far.

                Comment


                • #78
                  i found out why the compiler did that

                  i put the "i" variable in global(or what you call it) instead of main
                  quick code for testing purposes, dont start bashing my C again

                  with variable not global the compiler optimizes the increments into decrements

                  Comment


                  • #79
                    No wonder

                    Originally posted by gens View Post
                    i found out why the compiler did that

                    i put the "i" variable in global(or what you call it) instead of main
                    quick code for testing purposes, dont start bashing my C again

                    with variable not global the compiler optimizes the increments into decrements
                    It was global? Hell, no wonder it was so bad.
                    FYI, setting the volatile flag on global vars will disable this optimisation again.

                    Always remember to keep the scope as small as possible and try to completely get rid of global variables or even structs, that's the best practice.

                    And again: Stop trying to trick the compiler, start writing good code! (As you don't know how other architectures behave).

                    Comment


                    • #80
                      Originally posted by gens View Post
                      i found out why the compiler did that

                      i put the "i" variable in global(or what you call it) instead of main
                      quick code for testing purposes, dont start bashing my C again

                      with variable not global the compiler optimizes the increments into decrements
                      this is a n00b mistake!

                      It seems that you understand all the intriguing parts of asm but you don't know how to write a fast C code. Sorry, in your unique case, you should still write assembly, as I've wrote before, you have to help sometimes the compiler, not make it impossible to optimize. What about the case of: stuff that changes RCX, would create an infinite loop?

                      Comment

                      Working...
                      X