Announcement

Collapse
No announcement yet.

Is foolish currently develop in machine code, hexadecimal and assembly?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Guest
    Guest replied
    Assembly is fairly simple and it gives you ultimate power, but merely writing in it does not make your code any faster.

    Also, inline assembly often produces worse resulting binary than what compiler can make, since compiler don't know / care about context of surrounding code.
    Exception to this is gcc inline assembly syntax which is little annoying at first, but you let compiler know what is input/output and how it should handle it.
    Similarly, compiler can still optimize stuff with simd intrisics (sse), but yet again, it's pain to write such code (despite advantages, such as type control).

    Functions can also be written in assembly and then you can use them in C/C++/other code with ease, but you need to know about calling conventions that you are using (especially x86_64 ABI can be sometimes challenging, though nothing programmer couldn't get used to in few hours/days).

    Machine code is fairly silly nowadays, same with hexadecimal (it's possible, but most people don't care about such low level stuff unless they are writing fancy stuff via shellcode such as code injections).

    On my college, assembly was taught in first semester, just to give people feel about how x86/arm works).

    Overall, I don't think that anyone is writing in assembly nowadays without a good reason to do so (such as performance critical code, low level system/hardware stuff).

    Leave a comment:


  • erendorn
    replied
    Originally posted by gens View Post
    i had this discussion on this very forum a long while ago
    NOBODY could write a better matrix multiplication loop in C then the one i patched up in asm (and it was far from perfect)
    i think one test came to ~70% of performance
    Nobody contest one can write good tight loops in asm. The performance is always >= to that of the compiler because you can start with the compiler result. Everyone in this thread has cited places where asm can be justified.
    But it is simply useless unless it has been proven that the loop, written in a higher level language, is a bottleneck, which is very seldom the case.

    Leave a comment:


  • gens
    replied
    Originally posted by gens View Post
    here, this explains a part of it
    mhh it doesn't explain how it's done (data orientated instead of OO, iirc)

    anyway, this explains why how you organize data is very important for performance

    Leave a comment:


  • gens
    replied
    Originally posted by darkblu View Post
    Your loop has an off-by-one error with regard to the trip count - its branch condition reads 'jump if not greater', whereas you want a 'jump if less'.
    this, however, is true
    jl or jnz work bout in this case
    that was a quick example so i didn't think about it at all


    i'd also take to say that i wrote about how hard it is to write C vs asm
    not how productive you will be
    Last edited by gens; 30 October 2014, 06:37 PM.

    Leave a comment:


  • gens
    replied
    Originally posted by darkblu View Post
    ...
    you wrote in a "troll bashing" way, so i cant and wont give a proper reply

    here's a summary anyway
    cpu dispatching
    optimized functions, not whole programs


    and for the others,
    cache trashing is due to data being sparse in memory
    for example in standard C++ (like you are thought) there is metadata associated with the data (linked list vs whatever its called in C++) causing more, and some times irregular, memory access
    you are gonna have to have my word as i can't find the sony paper on it
    here, this explains a part of it

    and that comes down to how you organize the data in memory
    and a compiler will never tell you how to do it, as it will just do as you tell it

    for further understanding on compilers i suggest finding the gcc LRA documentation and lots of thinking about cpu's
    (x264 dev's blog also has some interesting insights, and ofc Agner Fog's site)

    Leave a comment:


  • gens
    replied
    Originally posted by oleid View Post
    If it's possible, then it's probably mentioned here:
    http://www.agner.org/optimize/optimizing_assembly.pdf
    Agner Fog wrote a lot of good things and i recommend him to anyone who want's to write optimized code

    also note hes cpu instruction timing tables
    they explain some things about modern processors

    Leave a comment:


  • gens
    replied
    Originally posted by gamerk2 View Post
    There's these wonderful things called a CPU cache, register renaming, pipeline optimizations, and the like, that turn your hand-crafted assembly code into a really unoptimized mess. Unless your writing very unoptimized code, compilers are always going to create a faster executable over handwritten assembly as a result.

    Take my loop iteration example. There's a cost-benefit that goes into taking away a CPU register to constantly keep the loop iterator loaded versus the performance you lose due to loosing access to that register. And for many years, back during the early days of C (when admittedly, the PDP-11 C compiler stank), code typically used the REGISTER keyword tell the compiler to keep the iterator always loaded in a register, because it "avoided a costly memory read". At least, until people started to benchmark and found that freeing up that register and re-loading the iterator when needed often yielded more performance.

    Compilers have generated faster code then handwritten assembly for at least 30 years now, and if a certain compiler doesn't, then it should be replaced with one that works better.
    why would you assume that i don't know what a cpu cache is ?
    use perf if you want to see how much modern programs trash cache

    register renaming helps bout the humans and compilers alike
    that said, just shuffling the order of instructions around is no problem

    "There's a cost-benefit that goes into taking away a CPU register to constantly keep the loop iterator loaded versus the performance you lose due to loosing access to that register."
    what ?
    there is NO benefit in register spilling
    if you run out of registers, then you look what to spill

    compilers have generated...
    i had this discussion on this very forum a long while ago
    NOBODY could write a better matrix multiplication loop in C then the one i patched up in asm (and it was far from perfect)
    i think one test came to ~70% of performance

    Leave a comment:


  • darkblu
    replied
    Originally posted by gens View Post
    assembly is a fairly simple language
    y sure there is a fair bit to learn about how cpu's work till you can program in it, but after that the rest of the details are easy
    A simple language does not automatically translate to ease of programming. A Turing machine has a rudimentary a 'language' as it gets - I'm yet to see somebody write an OS in it.

    x86 cpu's have not changed much since i686 with amd64 being the biggest change
    so what you write would work on any x86 or amd64 cpu
    In x86 world there is a new ISA extension approx every two microarchitecture generations. Starting from SSE2 (which is built in the amd64 ISA), there are SSE3, SSSE3, SSE4.1, SSE4.2, SSE4a, AES, AVX, AVX2, FMA3, FMA4, F16C... I could carry on but I see no point it.

    assembly, like almost any other programming language, is "easy" to maintain if the code is well commented
    A well-documented assembly code is definitely easier to maintain than undocumented assembly code, but it's also definitely not easy to maintain per se. Which is why we moved on from doing that.

    an example of a for loop
    Code:
    for (int i = 0; i < 100; i++) {
        //code
    }
    mov ecx, 0
    some_label:
    //code
    inc ecx
    cmp ecx, 100
    jng some_label
    Your loop has an off-by-one error with regard to the trip count - its branch condition reads 'jump if not greater', whereas you want a 'jump if less'.

    that a compiler usually changes to:
    (and its easier for a human)

    mov ecx, 100
    some_label:
    //code
    dec ecx
    jnz some_label
    Whether the latter is easier to read largely depends on the body of the loop - how the iterator is used, what computations it participates in, what pointer arithmetics, etc.

    adding two numbers together is just "add register/memory, memory/register/intermediate"
    with the limitation there being that you can not have bout values in memory (but you can do like "add [memory_address], intermediate_value")
    and details like that
    Again, you're making the mistake of equating the simplicity of the language with the simplicity of writing arbitrary code in it. Those are not equal. Now, being able to read assembly (and knowing a good deal of microarchitecture details, whatever the target machine might be) is essential for developers who care about performance. Writing in assembly, though, does not go under the same clause. Understanding well your C/C++ compiler and working along with it via intrinsics, the occasional small inline assembly block (as a last resort), and your best friend - the performance-counter profiler, is how one writes performant, maintainable code today.

    Leave a comment:


  • mirza
    replied
    Previously, it was best option, among badly optimized compilers and interpreters. I started with Z80, then MC68000, Intel, Atmel and ARM. But that was looooong time ago. Today it is useless, unless you are kernel hacker, or into few other corner cases mentioned above. NEVER use it for general apps, games or utilities, it is just stupid idea, like creating OS kernel in Visual Basic 6. However, understanding assembly language (and CPU cache) is important IMHO, for figuring out properly how computer actually operates.

    Leave a comment:


  • OneTimeShot
    replied
    Assembly is a programming language like C/C++/PHP/Java/C# and so forth. You select the correct language for the job in hand.

    Places where I have seem Assembly used correctly include:
    - Discrete Cosine Transforms in video and audio decompression
    - SHA512/RSA cryptographic algorithms on a high performance accelerator
    - Key graphics processing loops in games

    Disadvantages of assembly:
    - It has a very, very high developer workload weighting (especially if you need to do better than a modern C compiler)


    So, if you want to code everything in Assembly - guess what: It's Turing complete, so you can! If you write it in C, you'd probably be writing 10 times as much functionality in a day. If you wrote it in Java/C# you'd be writing 50 times as much functionality in a day.

    So ultimately, the question is: Who's paying you for your time, and do they think that they are getting value for money?

    Leave a comment:

Working...
X