Announcement

Collapse
No announcement yet.

Is Assembly Still Relevant To Most Linux Software?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by TemplarGR View Post
    This is not surprising at all. There is a myth that programming in assembly produces faster executable code, but this is not always the case. It enterily depends on the programmer's ability. Many times a compiler produces better executables than hand written assembly, and is less prone to errors, not to mention developing in assembly takes ages...

    There is absolutely no point in using it today, except for corner cases.

    In fact, we are rapidly approaching an age where almost all programs will be written in managed languages like Java and C#. Virtual machines have come a long way in speed, and with multicores now the norm, managed code can be as fast or sometimes even more than unmanaged code.
    " There is a myth that programming in assembly produces faster executable code, but this is not always the case. "
    nice paradox (y)

    assembly will never die for compilers will never get as good as a human can
    for a human can think, a human can cheat and a compiler can only do what it has been told to do by a human

    anyway, i seen with perf top that memcpy() was slower then it should be
    so i spent some time making my version of memcpy(), and it was 2-3 times faster then (also hand made) glibc default one
    then i updated the systems glibc to find out its memcpy() gotten faster then my version by some 50%
    now i have a plan to extend my version further

    if i let a (any) compiler make a version of memcpy(), it would be slower, sometimes a lot slower
    even thou the compiler has a template just for memcpy, it cant know exactly how it's gonna be used
    it can know if it can use sse(1234), but youl'd have to first tell it and still it would be slower then my version that is slower then what glibc ppl made

    and yes, speed matters in real time interactive programs
    you can, if careful, make a fast program in C++(maybe even C#, idk) but if you want the fastest that can be you need asm in some spots and lots of benchmarks

    Comment


    • #17
      Nice

      Originally posted by gens View Post
      " There is a myth that programming in assembly produces faster executable code, but this is not always the case. "
      nice paradox (y)

      assembly will never die for compilers will never get as good as a human can
      for a human can think, a human can cheat and a compiler can only do what it has been told to do by a human

      anyway, i seen with perf top that memcpy() was slower then it should be
      so i spent some time making my version of memcpy(), and it was 2-3 times faster then (also hand made) glibc default one
      then i updated the systems glibc to find out its memcpy() gotten faster then my version by some 50%
      now i have a plan to extend my version further

      if i let a (any) compiler make a version of memcpy(), it would be slower, sometimes a lot slower
      even thou the compiler has a template just for memcpy, it cant know exactly how it's gonna be used
      it can know if it can use sse(1234), but youl'd have to first tell it and still it would be slower then my version that is slower then what glibc ppl made

      and yes, speed matters in real time interactive programs
      you can, if careful, make a fast program in C++(maybe even C#, idk) but if you want the fastest that can be you need asm in some spots and lots of benchmarks
      That sounds interesting. Did you commit your improved version to glibc upstream?

      Comment


      • #18
        Originally posted by frign View Post
        That sounds interesting. Did you commit your improved version to glibc upstream?
        no, i made it in FASM
        my version was faster cuz glibc one was copying 8bytes at a time, even thou it sayd it is an sse2 version (sse can copy 16bytes at a time)
        still my version would be faster if glibc was using its proper sse2 version, for i made a simpler logic
        now the ssse3 version that is faster then mine is faster only in few cases when the source and dest are 1byte unaligned (and with blocks way bigger then the cpu cache, that i can optimize rather easy but am lazy)

        then there is Agner Fog's version, that i dont quite understand
        from what i seen from it a compiler cant make anything like that, at least not without heavy care from the programer

        bdw, string operations are another case where assembly can make a big difference

        Comment


        • #19
          Originally posted by zanny View Post
          What? This isn't about managed vs unmanaged, and if anything the Linux development space is trending towards Qt, which is C++ and native, since Java is a slow boring piece of crap, and C# (mainly Xamarin) has been disowning the Linux developer scene hardcore. Neither are supported across mobile platforms either, which is key, and something that Qt actually has going for it as well, and as GTK and Gnome get its act together it remains a competitor.

          Java is slow? Have any proof of that? And Java runs on most Mobile phones out there. Where did you get your info, or are you spewing facts from your hind quarters?

          Comment


          • #20
            Mixed messages

            While the report lists some valuable software, even the Linux kernel itself, as using assembly, it then concludes that "that most of the Assembly code has little value"? Sounds like someone is spinning an agenda, here.

            Comment


            • #21
              Glib sux

              Originally posted by gens View Post
              no, i made it in FASM
              my version was faster cuz glibc one was copying 8bytes at a time, even thou it sayd it is an sse2 version (sse can copy 16bytes at a time)
              still my version would be faster if glibc was using its proper sse2 version, for i made a simpler logic
              now the ssse3 version that is faster then mine is faster only in few cases when the source and dest are 1byte unaligned (and with blocks way bigger then the cpu cache, that i can optimize rather easy but am lazy)

              then there is Agner Fog's version, that i dont quite understand
              from what i seen from it a compiler cant make anything like that, at least not without heavy care from the programer

              bdw, string operations are another case where assembly can make a big difference
              Thanks for the info.
              When it comes to string-operations, I avoid the glibc-string-headers like the plague and rather refrain to writing my own string-functions which are almost always 1.5-2x faster than the glibc-counterpart (mostly, because you can design your functions for your current needs and strip a lot).
              Suckless.org has a good reason to list GlibC as a library considered harmful (http://suckless.org/sucks) and we should all stay away from it if possible and make up our own mind on how to deal with those things effectively, without being infected with C++ STL-crap. (Call me a C++-Hater, that's what I am ).

              Comment


              • #22
                CPU technology is moving too fast, making it hard to justify using assembly for anything other than specialized platform specific libraries (ie: media codecs, etc). Compilers are good enough and ram is now stupid cheap.

                As for C++ stl hater you must not do much work atall at the high level. I'm more irritated by the other than utf-8 crap out there causing problems. String stuff? not a big deal when shaving an iteration off PDE equations or finding a way to better define a PDE interpolation field (to allow more sparse expensive evaluations) is the killer.

                That being said I *have* written my own little toy string class set but haven't recently performance checked it. It seems performance for what I do is more heavily dependent on limiting memory allocations (fun with massively parallel stuff) than just a few instructions. Modern intel cpus seem to stall waiting for data more than they do executing a few instructions.
                Last edited by bnolsen; 04-02-2013, 12:01 PM.

                Comment


                • #23
                  C++ STL still sux

                  Originally posted by bnolsen View Post
                  CPU technology is moving too fast, making it hard to justify using assembly for anything other than specialized platform specific libraries (ie: media codecs, etc). Compilers are good enough and ram is now stupid cheap.

                  As for C++ stl hater you must not do much work atall at the high level. I'm more irritated by the other than utf-8 crap out there causing problems. String stuff? not a big deal when shaving an iteration off PDE equations or finding a way to better define a PDE interpolation field (to allow more sparse expensive evaluations) is the killer.

                  That being said I *have* written my own little toy string class set but haven't recently performance checked it. It seems performance for what I do is more heavily dependent on limiting memory allocations (fun with massively parallel stuff) than just a few instructions. Modern intel cpus seem to stall waiting for data more than they do executing a few instructions.
                  C++ STL-hater here: If I was to completely abandon all high-level Glib-string-functions, why should I even use Glib then and not rewrite it cleanly using good ol' C? A discussion about this topic is rather senseless, because it depends on tastes, no question .

                  Comment


                  • #24
                    Originally posted by gens View Post
                    bdw, string operations are another case where assembly can make a big difference
                    Ubuntu/Linaro folks, this is your monthly reminder that cortex-strings is still not upstreamed.

                    Comment


                    • #25
                      Originally posted by bnolsen View Post
                      CPU technology is moving too fast, making it hard to justify using assembly for anything other than specialized platform specific libraries (ie: media codecs, etc). Compilers are good enough and ram is now stupid cheap.

                      As for C++ stl hater you must not do much work atall at the high level. I'm more irritated by the other than utf-8 crap out there causing problems. String stuff? not a big deal when shaving an iteration off PDE equations or finding a way to better define a PDE interpolation field (to allow more sparse expensive evaluations) is the killer.

                      That being said I *have* written my own little toy string class set but haven't recently performance checked it. It seems performance for what I do is more heavily dependent on limiting memory allocations (fun with massively parallel stuff) than just a few instructions. Modern intel cpus seem to stall waiting for data more than they do executing a few instructions.
                      assembly is also better with atomics and all that modern crap
                      memory is limiting on modern hardware, so you have to care about the cache, alignment and more
                      read some of Agner Fog's publications on that
                      also there is a blog by one x264 dev where you can read why x264 is way faster then other encoders (hint; asm sse/avx and cache optimizations)

                      anyway, what i wanted to say:

                      "Linaro is a not-for-profit engineering organization consolidating and optimizing open source Linux software and tools for the ARM architecture."

                      there you have it
                      ARM is a simple instruction set, and is easy for a compiler (and yet still hand written ARM asm can be faster/smaller)
                      there is not much to optimize on ARM, it is a simple instruction set
                      x86(and such) are complex instruction sets, and are if you ask me way more advanced then ARM
                      only way to bring ARM on par with x86 is to add stuff to the architecture (like "vector operations", that is avx/sse)

                      if any linaro guys are reading:
                      go get musl, get clang, get other key programs that dont need assembly
                      but dont force you opinion on others, at least dont without proof (for others reading, there is no proof that a compiler can beat a human in anything but speed, as in speed of making a program)

                      PS C has been out for decades and theres still leaps in compilers optimizing it
                      what about higher level languages ? centuries ?

                      PPS Utf-8 crap ? do you know what is UNICODE for ? if you dont and are a programer, then you are american and narrow minded
                      Last edited by gens; 04-02-2013, 12:25 PM.

                      Comment


                      • #26
                        Originally posted by gens View Post
                        PPS Utf-8 crap ? do you know what is UNICODE for ? if you dont and are a programer, then you are american and narrow minded
                        The whole nasty around UTF8, UTF16 (which variant?) UTF32 (which variant?) is a bigger can of worms of trouble than implementation issues on some string libraries. I'm very painfully aware of internationalization stuff.

                        Comment


                        • #27
                          Originally posted by bnolsen View Post
                          The whole nasty around UTF8, UTF16 (which variant?) UTF32 (which variant?) is a bigger can of worms of trouble than implementation issues on some string libraries. I'm very painfully aware of internationalization stuff.
                          UTF8 should be the standard
                          the plan9 guys, who are responsible for Unix, have made plan9 use UNICODE universally

                          everything that manipulates text should be made UTF8 compatible

                          actually UTF8 is not that complicated
                          it is a superset of ASCII, the last(or first, dont remember) bit designates UNICODE and the encryption is simple and cpu friendly (details on the wiki)
                          so extending libraries should be fairly simple, if you got their source

                          PS as i understand the next big step for cpu architectures is to add more cores
                          cuz there is (probably) not much to better in existing cores
                          Last edited by gens; 04-02-2013, 01:34 PM.

                          Comment


                          • #28
                            Talking about Unicode ...

                            Originally posted by bnolsen View Post
                            The whole nasty around UTF8, UTF16 (which variant?) UTF32 (which variant?) is a bigger can of worms of trouble than implementation issues on some string libraries. I'm very painfully aware of internationalization stuff.
                            Yes, this is a valid reason. But unless you are dependent on the cyrillic alphabet, ASCII is not much of a problem (and easiest to handle in terms of programming) for most users.
                            Still, Unicode is well-designed but lacks a proper implementation. The Locale-system seems half-assed and is too complex for the needs of a dedicated, out-of-the-box user-experience.

                            Comment


                            • #29
                              You don't have to use pure assembler to use simd. Gcc/cl etc exposes intrinsics, that way you can decide exactly what simd instructions to use but letting the compiler manage instruction and register scheduling. Imoh less error prone and much simpler (especially maintenance). You can always read the asm efter compilation to check that it's reasonable. I had a few cases where I had to use -O1 with gcc and do a bit of the instruction scheduling myself (sometimes the gcc register and instruction schedulers interferes with each other on odd architectures), but still a win over full manual asm.

                              If you do stuff like image processing it can be quite worthwhile.

                              Comment


                              • #30
                                Originally posted by Qaz` View Post
                                You don't have to use pure assembler to use simd. Gcc/cl etc exposes intrinsics, that way you can decide exactly what simd instructions to use but letting the compiler manage instruction and register scheduling. Imoh less error prone and much simpler (especially maintenance). You can always read the asm efter compilation to check that it's reasonable. I had a few cases where I had to use -O1 with gcc and do a bit of the instruction scheduling myself (sometimes the gcc register and instruction schedulers interferes with each other on odd architectures), but still a win over full manual asm.

                                If you do stuff like image processing it can be quite worthwhile.
                                Sorry but given that intrinsics are CPU/ISA-specific, you'd still have to port your code, which is the point of the article: what x86 code exists that needs ARM64 porting?

                                Comment

                                Working...
                                X