Announcement

Collapse
No announcement yet.

Is foolish currently develop in machine code, hexadecimal and assembly?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by gens View Post
    ?
    there is no way to make a loop without using a register
    if you know assembly please do show me a way

    well, there is a way
    cmp instruction can compare a value in memory with an intermediate value
    and dec instruction can also be used on a value in memory
    but as i said you don't want to touch memory if you don't have to, as it is a lot slower

    so if you have under 16 variables on amd64, you want to have them all in registers (8 for x86)

    and no, compilers are worse at choosing what registers are for what
    make a small c program and look at what the compiler produces
    you can use this or just objdump (-M intel for intel syntax)


    but i'm still interested in your 9/10 cases
    please do present a couple
    There's these wonderful things called a CPU cache, register renaming, pipeline optimizations, and the like, that turn your hand-crafted assembly code into a really unoptimized mess. Unless your writing very unoptimized code, compilers are always going to create a faster executable over handwritten assembly as a result.

    Take my loop iteration example. There's a cost-benefit that goes into taking away a CPU register to constantly keep the loop iterator loaded versus the performance you lose due to loosing access to that register. And for many years, back during the early days of C (when admittedly, the PDP-11 C compiler stank), code typically used the REGISTER keyword tell the compiler to keep the iterator always loaded in a register, because it "avoided a costly memory read". At least, until people started to benchmark and found that freeing up that register and re-loading the iterator when needed often yielded more performance.

    Compilers have generated faster code then handwritten assembly for at least 30 years now, and if a certain compiler doesn't, then it should be replaced with one that works better.

    Comment


    • #12
      Originally posted by gens View Post
      ?
      but i'm still interested in your 9/10 cases
      please do present a couple
      If it's possible, then it's probably mentioned here:

      Comment


      • #13
        Originally posted by oleid View Post
        If it's possible, then it's probably mentioned here:
        http://www.agner.org/optimize/optimizing_assembly.pdf
        In any case, this link answers in much details the question asked by OP.

        Comment


        • #14
          Assembly is a programming language like C/C++/PHP/Java/C# and so forth. You select the correct language for the job in hand.

          Places where I have seem Assembly used correctly include:
          - Discrete Cosine Transforms in video and audio decompression
          - SHA512/RSA cryptographic algorithms on a high performance accelerator
          - Key graphics processing loops in games

          Disadvantages of assembly:
          - It has a very, very high developer workload weighting (especially if you need to do better than a modern C compiler)


          So, if you want to code everything in Assembly - guess what: It's Turing complete, so you can! If you write it in C, you'd probably be writing 10 times as much functionality in a day. If you wrote it in Java/C# you'd be writing 50 times as much functionality in a day.

          So ultimately, the question is: Who's paying you for your time, and do they think that they are getting value for money?

          Comment


          • #15
            Previously, it was best option, among badly optimized compilers and interpreters. I started with Z80, then MC68000, Intel, Atmel and ARM. But that was looooong time ago. Today it is useless, unless you are kernel hacker, or into few other corner cases mentioned above. NEVER use it for general apps, games or utilities, it is just stupid idea, like creating OS kernel in Visual Basic 6. However, understanding assembly language (and CPU cache) is important IMHO, for figuring out properly how computer actually operates.

            Comment


            • #16
              Originally posted by gens View Post
              assembly is a fairly simple language
              y sure there is a fair bit to learn about how cpu's work till you can program in it, but after that the rest of the details are easy
              A simple language does not automatically translate to ease of programming. A Turing machine has a rudimentary a 'language' as it gets - I'm yet to see somebody write an OS in it.

              x86 cpu's have not changed much since i686 with amd64 being the biggest change
              so what you write would work on any x86 or amd64 cpu
              In x86 world there is a new ISA extension approx every two microarchitecture generations. Starting from SSE2 (which is built in the amd64 ISA), there are SSE3, SSSE3, SSE4.1, SSE4.2, SSE4a, AES, AVX, AVX2, FMA3, FMA4, F16C... I could carry on but I see no point it.

              assembly, like almost any other programming language, is "easy" to maintain if the code is well commented
              A well-documented assembly code is definitely easier to maintain than undocumented assembly code, but it's also definitely not easy to maintain per se. Which is why we moved on from doing that.

              an example of a for loop
              Code:
              for (int i = 0; i < 100; i++) {
                  //code
              }
              mov ecx, 0
              some_label:
              //code
              inc ecx
              cmp ecx, 100
              jng some_label
              Your loop has an off-by-one error with regard to the trip count - its branch condition reads 'jump if not greater', whereas you want a 'jump if less'.

              that a compiler usually changes to:
              (and its easier for a human)

              mov ecx, 100
              some_label:
              //code
              dec ecx
              jnz some_label
              Whether the latter is easier to read largely depends on the body of the loop - how the iterator is used, what computations it participates in, what pointer arithmetics, etc.

              adding two numbers together is just "add register/memory, memory/register/intermediate"
              with the limitation there being that you can not have bout values in memory (but you can do like "add [memory_address], intermediate_value")
              and details like that
              Again, you're making the mistake of equating the simplicity of the language with the simplicity of writing arbitrary code in it. Those are not equal. Now, being able to read assembly (and knowing a good deal of microarchitecture details, whatever the target machine might be) is essential for developers who care about performance. Writing in assembly, though, does not go under the same clause. Understanding well your C/C++ compiler and working along with it via intrinsics, the occasional small inline assembly block (as a last resort), and your best friend - the performance-counter profiler, is how one writes performant, maintainable code today.

              Comment


              • #17
                Originally posted by gamerk2 View Post
                There's these wonderful things called a CPU cache, register renaming, pipeline optimizations, and the like, that turn your hand-crafted assembly code into a really unoptimized mess. Unless your writing very unoptimized code, compilers are always going to create a faster executable over handwritten assembly as a result.

                Take my loop iteration example. There's a cost-benefit that goes into taking away a CPU register to constantly keep the loop iterator loaded versus the performance you lose due to loosing access to that register. And for many years, back during the early days of C (when admittedly, the PDP-11 C compiler stank), code typically used the REGISTER keyword tell the compiler to keep the iterator always loaded in a register, because it "avoided a costly memory read". At least, until people started to benchmark and found that freeing up that register and re-loading the iterator when needed often yielded more performance.

                Compilers have generated faster code then handwritten assembly for at least 30 years now, and if a certain compiler doesn't, then it should be replaced with one that works better.
                why would you assume that i don't know what a cpu cache is ?
                use perf if you want to see how much modern programs trash cache

                register renaming helps bout the humans and compilers alike
                that said, just shuffling the order of instructions around is no problem

                "There's a cost-benefit that goes into taking away a CPU register to constantly keep the loop iterator loaded versus the performance you lose due to loosing access to that register."
                what ?
                there is NO benefit in register spilling
                if you run out of registers, then you look what to spill

                compilers have generated...
                i had this discussion on this very forum a long while ago
                NOBODY could write a better matrix multiplication loop in C then the one i patched up in asm (and it was far from perfect)
                i think one test came to ~70% of performance

                Comment


                • #18
                  Originally posted by oleid View Post
                  If it's possible, then it's probably mentioned here:
                  http://www.agner.org/optimize/optimizing_assembly.pdf
                  Agner Fog wrote a lot of good things and i recommend him to anyone who want's to write optimized code

                  also note hes cpu instruction timing tables
                  they explain some things about modern processors

                  Comment


                  • #19
                    Originally posted by darkblu View Post
                    ...
                    you wrote in a "troll bashing" way, so i cant and wont give a proper reply

                    here's a summary anyway
                    cpu dispatching
                    optimized functions, not whole programs


                    and for the others,
                    cache trashing is due to data being sparse in memory
                    for example in standard C++ (like you are thought) there is metadata associated with the data (linked list vs whatever its called in C++) causing more, and some times irregular, memory access
                    you are gonna have to have my word as i can't find the sony paper on it
                    here, this explains a part of it

                    and that comes down to how you organize the data in memory
                    and a compiler will never tell you how to do it, as it will just do as you tell it

                    for further understanding on compilers i suggest finding the gcc LRA documentation and lots of thinking about cpu's
                    (x264 dev's blog also has some interesting insights, and ofc Agner Fog's site)

                    Comment


                    • #20
                      Originally posted by darkblu View Post
                      Your loop has an off-by-one error with regard to the trip count - its branch condition reads 'jump if not greater', whereas you want a 'jump if less'.
                      this, however, is true
                      jl or jnz work bout in this case
                      that was a quick example so i didn't think about it at all


                      i'd also take to say that i wrote about how hard it is to write C vs asm
                      not how productive you will be
                      Last edited by gens; 30 October 2014, 06:37 PM.

                      Comment

                      Working...
                      X