Originally posted by frign
View Post
Announcement
Collapse
No announcement yet.
Is Assembly Still Relevant To Most Linux Software?
Collapse
X
-
Proper C
Originally posted by gens View Postim open to critic
but that example wasnt an example of how to do things in C, it was just the simplest one liner that came to mind
ofc normally i write it structured like
Code:for( i=0; i<66; i++) { things }
as i dont do it for money i dont care rly
and it gets compiled like that shorter example in asm
thing is C was made for humans too, even thou it was made so programers dont have to write assembly
so C maps well to assembly but also to human logic and compiler knows you just want to make that loop run 66 times
but what C dosent tell you is how many registers that cpu has
it comes natural to use as many variables in a loop as needed to reduce calls and double calculations
but if you use more variables then you have registers then the compiler has to store them to ram and read back from cpu cache when needed
thats one case where you dont know what your doing cuz you never learned and the compiler didnt tell you
probably wont help much in performance thou as it gets loaded quite fast from the cache
In regards to Loop- and In/Decrement-efficiency, there is a great paper about it by the folks from IAR here.
Even though the bottlenecks of inefficient code might seemingly be negligible, it is a factor to be still considered, because small issues can sum up into bigger ones once you scale up. And if you learn how to do it properly, you won't even be slower doing it the right way!
Compilers are a great way of dealing with many architectures, so considering the same optimisation-paradigms work under many architectures, proper C doesn't even require you to know much about many architectures, but it can ultimately help you _know_ what is proper C in the first place.
Best regards
FRIGN
Comment
-
Use your brain
Originally posted by droste View Postat least gcc has the same result for both loops (incrementing / decrementing) if you enable optimizations (-O2). It produces a decrement loop with jne.
Comment
-
Originally posted by frign View PostIn this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
Also, as it would be assembly, his optimization would exclude the possibility that the compiler can inline do_stuff(); method (if is a method).
If is inlined/or stuff is code, the compiler will do other things, like will see things that don't depend on i variable and they will be moved outside of the loop (Loop-Invariant-Code-Motion optimization). After that it may find that the loop has a formula that can be computed at compile time as it is a constant expression and it will take 0 ms (no CPU type) as the compiler will be able to compute the expression. Yet sometimes, as the value to iterate is bigger than 66, the compiler can take the decision to split the loop in sequences of 4 like this:
Code:for(auto i=0; i<63; ) { stuff(); i++; stuff(); i++; stuff(); i++; stuff(); i++; } while(i<66) { stuff(); i++; }
What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).Last edited by ciplogic; 07 April 2013, 02:23 PM.
Comment
-
Originally posted by frign View PostIn this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
the compiler didnt optimize it, so ye its better to go --
makes me wonder now why as it should be the same logic
hmm
anyway every language has a lot to learn
in asm its instruction scheduling and things like that
in C it is to be good to the compiler
theres hundreds of pages of text for both topics
i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me
Comment
-
Originally posted by ciplogic View Postand maybe the 4 stuff can be auto-vectorized or even is not, it removes 3 branches (which are costly even in out-of-order CPUs).
What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).
another thing a compiler cant do is what it cant know
like when you know a loop will only execute 3-7 times
a compiler dosent know that so it will put out a speed optimized version, one that is bloated for what it does
also i never said you have to write whole programs in assembly too get performance
on the contrary i said you are best to write only few tightest loops in assembly
also can you show me a program more optimized then x264 or glibc ?
benchmark the pure C version of musl against glibc then you can say for sure how good a compiler is
musl from what i see is good, optimized, C so perfect for benchmarks
Comment
-
I agree
Originally posted by gens View Posti just tested the for loop
the compiler didnt optimize it, so ye its better to go --
makes me wonder now why as it should be the same logic
hmm
anyway every language has a lot to learn
in asm its instruction scheduling and things like that
in C it is to be good to the compiler
theres hundreds of pages of text for both topics
i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me
Comment
-
i found out why the compiler did that
i put the "i" variable in global(or what you call it) instead of main
quick code for testing purposes, dont start bashing my C again
with variable not global the compiler optimizes the increments into decrements
Comment
-
No wonder
Originally posted by gens View Posti found out why the compiler did that
i put the "i" variable in global(or what you call it) instead of main
quick code for testing purposes, dont start bashing my C again
with variable not global the compiler optimizes the increments into decrements
FYI, setting the volatile flag on global vars will disable this optimisation again.
Always remember to keep the scope as small as possible and try to completely get rid of global variables or even structs, that's the best practice.
And again: Stop trying to trick the compiler, start writing good code! (As you don't know how other architectures behave).
Comment
-
Originally posted by gens View Posti found out why the compiler did that
i put the "i" variable in global(or what you call it) instead of main
quick code for testing purposes, dont start bashing my C again
with variable not global the compiler optimizes the increments into decrements
It seems that you understand all the intriguing parts of asm but you don't know how to write a fast C code. Sorry, in your unique case, you should still write assembly, as I've wrote before, you have to help sometimes the compiler, not make it impossible to optimize. What about the case of: stuff that changes RCX, would create an infinite loop?
Comment
Comment