Quote Originally Posted by gens View Post
no, i made it in FASM
my version was faster cuz glibc one was copying 8bytes at a time, even thou it sayd it is an sse2 version (sse can copy 16bytes at a time)
still my version would be faster if glibc was using its proper sse2 version, for i made a simpler logic
now the ssse3 version that is faster then mine is faster only in few cases when the source and dest are 1byte unaligned (and with blocks way bigger then the cpu cache, that i can optimize rather easy but am lazy)

then there is Agner Fog's version, that i dont quite understand
from what i seen from it a compiler cant make anything like that, at least not without heavy care from the programer

bdw, string operations are another case where assembly can make a big difference
Thanks for the info.
When it comes to string-operations, I avoid the glibc-string-headers like the plague and rather refrain to writing my own string-functions which are almost always 1.5-2x faster than the glibc-counterpart (mostly, because you can design your functions for your current needs and strip a lot).
Suckless.org has a good reason to list GlibC as a library considered harmful (http://suckless.org/sucks) and we should all stay away from it if possible and make up our own mind on how to deal with those things effectively, without being infected with C++ STL-crap. (Call me a C++-Hater, that's what I am ).