Announcement

Collapse
No announcement yet.

Is Assembly Still Relevant To Most Linux Software?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by frign View Post
    I definitely agree on the last point, but no one stops the compiler from optimizing it accordingly. I knew uint_fast8_t just maps the type to an unsigned char, which makes the code more readable and leaves every distributor of the GNU operating system free to set those typedefs in one location to fix certain architectural quirks.

    Sadly, today's languages encourage to pile up a lot of stuff in memory. C++-classes are one example for an insufficient concept. Even though the compiler does a great job on keeping the code good enough, doing the same in C might have it's advantages (also in regards to inline-assembly). With no doubt, this is a religious question .

    So, it was nice discussing with you! Please let me know which language you prefer.
    Oh lets not start that discussion; it can get way to fierce

    But just to answer your question; I'm a C guy all the way. I've written in C++, in Java and did a few lines of .Net (not voluntarily, and it was power-shell really, but is there a difference?).

    I find C(99) is awesome. It is a nice compromise between assembly and being high and portable. You can do anything you want, wrong or write. It forces you to KNOW what you are doing, which I think is a plus. We are talking coding for fun and love here. Not for $work$ where it has to be done yesterday.

    Java is 'okay' but I don't like it really. I find it way to slow.

    .Net I find a horrible abomination. It's dog slow, ugly contraption that hopefully will die soon.

    Anyway, no flamewar, just sharing my opinion and tastes on the subject

    Comment


    • Originally posted by Obscene_CNN View Post
      Here is a quick small asm hack I made to the radeon r600_shader.c file in mesa. Note it is only two instructions. It sure beats doing a loop with a test in the middle. lets see if you can get a c compiler to beat this .
      }
      Hopefully you'll submit a patch Leave the C code as comment above, add a comment as to why this is faster and why it's good, have a ifdef guard around so it's only enabled when requested somehow and have your name in fame

      Comment


      • Originally posted by oliver View Post
        Oh lets not start that discussion; it can get way to fierce

        But just to answer your question; I'm a C guy all the way. I've written in C++, in Java and did a few lines of .Net (not voluntarily, and it was power-shell really, but is there a difference?).

        I find C(99) is awesome. It is a nice compromise between assembly and being high and portable. You can do anything you want, wrong or write. It forces you to KNOW what you are doing, which I think is a plus. We are talking coding for fun and love here. Not for $work$ where it has to be done yesterday.

        Java is 'okay' but I don't like it really. I find it way to slow.

        .Net I find a horrible abomination. It's dog slow, ugly contraption that hopefully will die soon.

        Anyway, no flamewar, just sharing my opinion and tastes on the subject
        Without being a flamewar, I would like to justify your fundamentals. I can say that are programs that are C and are 30x slower than C#: http://stackoverflow.com/questions/6...nce-difference , of course of not using correctly C. If C# is an order of magnitude slower than C is mostly the very same cause.

        Powershell is not .Net any more than saying running scripts in bash is the same as running them in C, as you're running programs that are compiled in C. The single (visible) difference of PowerShell compared with Bash (scripts) is that PS will output .Net objects that are serialized instead of plain text. If you want to compare an interactive C#, try gsharp, or use a C# compiler to compile your C# project before running it.

        At last, you seem to know to insult everyone, and I think that Java fans out-there would say that are many instances when Java is faster than C: http://scribblethink.org/Computer/javaCbenchmark.html

        They even try to see why people think that Java is slow, and I think this is a proper explanation:
        Java program startup is slow. As a java program starts, it unzips the java libraries and compiles parts of itself, so an interactive program can be sluggish for the first couple seconds of use.

        This approaches being a reasonable explanation for the speed myth. But while it might explain user's impressions, it does not explain why many programmers (who can easily understand the idea of an interpreted program being compiled) share the belief.
        I understand the idea that C is simple and fancy, but at least take your facts to be grounded in reality.

        If you are bothered about Java's slow startup or .Net one, both have solutions: Java has GCJ and ExcelsiorJET, as .Net has NGen, or Mono has the AOT compiler (run mono with --aot parameter first). Mono is compiled to very similar code that a C compiler will output on their LLVM solutions (that are Mono4Android and MonoTouch), so this argument of C is tightly optimized looks to me again a bit outdated at least

        Comment


        • the main problem with java and c# is the memory consumption. as soon as you try to set a maximum memory usage, the jvm become slow as hell. to be nearly fast as a explicit memory management, the jvm need 3x-4x more memory being de facto not viable on most of embedded systems.

          Comment


          • Originally posted by disgrace View Post
            the main problem with java and c# is the memory consumption. as soon as you try to set a maximum memory usage, the jvm become slow as hell. to be nearly fast as a explicit memory management, the jvm need 3x-4x more memory being de facto not viable on most of embedded systems.
            Thank you your your input. I agree that this is a reason why NOT to use Java (in most embedded systems), but is not a reason to say that Java is slow, but memory hog. There are C++ programs that require a lot of memory and "are slow as hell" too. GCC with -flto or LibreOffice (or MS Office). As for your point, there is at least one Java (compiled) version for embedded: http://www.excelsiorjet.com/embedded/ (yet is commercial, I'm not related with this company or any products of it).

            To be on the C++ performance defence, I can say this link also says a similar story: http://www.codeproject.com/Articles/...-Csharp-vs-NET (C# is right for Desktop, Mono - on Windows - is a bit slow, but C# is not so good for ARM as is not well optimized). There it shows some performance pitfalls that C# (or C++'s hash_set) have in their implementation. But as you go directly to conclusions, you may find a quote that may be relevant that C# is slow, is the "mentality":
            These differences (C# coding style - n. r) can lead to slower programs, but I believe a performance-conscious C# programmer can write programs whose performance is comparable to similarly well-written C++ code. .NET is clearly not as fast as C++ yet, but in principle I see no reason why it couldn't be, given enough TLC from Microsoft or Xamarin.
            And this is also my experience with C#: for any particular reason, the matrix multiplication (that gens asked to me) was as fast in Mono/Linux as C++ was in Release mode on Windows.

            But what is definitely wrong is to say wrong facts. And facts becomes meme, even are not related with reality ("Java is slow"). I mean is fair to criticize C++ for compile times, it is fair to say that at least in the past as Mono did not have a generational GC (or Firefox still doesn't it have it today), you can get either bad pauses or you don't have enough throughput of freeing objects (this is the case of Firefox, but they are working on a generational GC: https://bugzilla.mozilla.org/show_bug.cgi?id=619558 ) or as you stated: for constrained memory systems, Java is not the first place to look for.

            At last, wishing anyone's language/platform to die (including Cobol, Delphi) is not grounded to reality: if there is software and it solves the problem well and even if today there are better solutions, there is no easy migration part (let's say 10.000.000 lines of Cobol to be rewritten in Java), the comments are really no way to improve anyone's feelings, even all agree that Cobol is not such of a good language, or Delphi is slower than C# and is less portable than C# (via Mono). In the same note, every time when a platform dies/leaves for Linux, I myself consider that is a lost opportunity. In the past Ubuntu was criticized that it would basically pack your NVidia drivers. If this would be reverted, it would mean that we are losing one option and it would be hard to workaround it for most users. The same as with drivers, the program languages/platforms (like Java, Mono) are important for Linux as they solve some problems (better) than other languages would do. What we lose when we don't have these languages? Basically some developers that would develop other software on Linux. Even I personally don't like Delphi (and I have to fix sometimes bugs in a Delphi codebase) and FreePascal is slower than Mono (for people that want to compare "native" vs "managed"), I really *love* the idea of Lazarus being packed in Ubuntu repositories: you have a good IDE + Visual designer that doesn't depend on any runtime that in many ways matches Qt. Sure, for performance freaks, they would not use FreePascal, as neither C# developers may not use for the matrix multiply of 3x3 a C# routine, but they would use an external library to do so, but for many particular purposes Lazarus works great. There is even a cool paint program written in it: http://sourceforge.net/projects/lazpaint/ . I think that Pinta (for now) is better, but would you want to not have LazPaint in your distro repository? Because someone dislike Delphi or Lazarus? Even worse, if you are the maintainer of LazPaint, would you enjoy to know that your project was removed, because some crazy guys slander FPC/Lazarus, so it was removed from Ubuntu because of mis-informations or half-facts and factoids?

            Comment


            • Originally posted by ciplogic View Post
              Without being a flamewar, I would like to justify your fundamentals. I can say that are programs that are C and are 30x slower than C#: http://stackoverflow.com/questions/6...nce-difference , of course of not using correctly C. If C# is an order of magnitude slower than C is mostly the very same cause.
              Of course, bad code is bad code. But an interpreted language will never be faster (when properly written) then a compiled language.
              Powershell is not .Net any more than saying running scripts in bash is the same as running them in C, as you're running programs that are compiled in C. The single (visible) difference of PowerShell compared with Bash (scripts) is that PS will output .Net objects that are serialized instead of plain text. If you want to compare an interactive C#, try gsharp, or use a C# compiler to compile your C# project before running it.
              The syntax seemed quite similar, but not that important really; I greatly disliked it
              At last, you seem to know to insult everyone, and I think that Java fans out-there would say that are many instances when Java is faster than C: http://scribblethink.org/Computer/javaCbenchmark.html
              They even try to see why people think that Java is slow, and I think this is a proper explanation:
              I understand the idea that C is simple and fancy, but at least take your facts to be grounded in reality.
              If you are bothered about Java's slow startup or .Net one, both have solutions: Java has GCJ and ExcelsiorJET, as .Net has NGen, or Mono has the AOT compiler (run mono with --aot parameter first). Mono is compiled to very similar code that a C compiler will output on their LLVM solutions (that are Mono4Android and MonoTouch), so this argument of C is tightly optimized looks to me again a bit outdated at least
              Well slow startup is one of the most important parts. I have an older android phone with little memory. (Java) apps take forever to start and sometimes run ok, sometimes sluggishly. When an app starts to use a lot of memory, other apps get forcefully stopped, so the whole 'but it stays in memory anyway' doesn't really work

              But yeah, Java can be fast-ish enough, by using much more memory. I think my _personal preference_ comes from all those fancy features that do things you don't want/expect, allow sloppy work (garbage collection/exception handling). They make things easier, sure but also less in control, or so it feels like anyway.

              Comment


              • Originally posted by oliver View Post
                Of course, bad code is bad code. But an interpreted language will never be faster (when properly written) then a compiled language.
                Here is why you should use Java and never listen what others say to you. Check your facts! Java is not an interpreted language from 1996. We are in 2013, so you are 17 years outdated. There were (or maybe still are) Java interpreters, but Java as it is, is not interpreted, just some parts of it, but your "hot" code is not. It is compiled. It is like a hybrid car, uses the interpreter for rarely used code, so it uses less memory there, and it compiles hot code, so it runs fast.

                But if you in fact were talking about other interpreted language (let's say Python), you may be right, but your phrasing didn't seem to have this distinction.

                Originally posted by oliver View Post
                The syntax seemed quite similar, but not that important really; I greatly disliked it
                Which syntax? If bash syntax resemble somewhat the C syntax means that Bash is C? Or an insanity? It is slow, sure, but no one stops you to not use it where you need performance. PowerShell main principle is that you don't get back text, but .Net objects, and applications that are PowerShell aware can get full data or the fields that they need and return back other objects. Instead of parsing yourself the output to get specific fields that some programs return to you in Bash, is better to have objects, and to query them by field. Still, this is not C#, and to make a PowerShell aware you need very little C# or C++.Net kind of coding, the rest you can write it in any language of your choice, including your beloved C.

                Well slow startup is one of the most important parts. I have an older android phone with little memory. (Java) apps take forever to start and sometimes run ok, sometimes sluggishly. When an app starts to use a lot of memory, other apps get forcefully stopped, so the whole 'but it stays in memory anyway' doesn't really work
                This is why (again) you should undestand that languages and implementations are different things. Android uses Java language but is kind of a different machine. It improved slightly, but is still not as fast as Mono For Android (which is in C++/C performance range). If you will have 1 hour of your life, you can understand that Android's VM is not tuned for ultimate performance: http://www.youtube.com/watch?v=Ls0tM-c4Vfo

                But yeah, Java can be fast-ish enough, by using much more memory. I think my _personal preference_ comes from all those fancy features that do things you don't want/expect, allow sloppy work (garbage collection/exception handling). They make things easier, sure but also less in control, or so it feels like anyway.
                As Java is not an interpreter (please again read from other sources than your C blogger that you probably follow), your slow applications that you match them with Java are more likely based on memes or not understanding what your application does. You can tune that Java will do GC in limited time slices from at least 3 years ago (read G1GC documentation). Java is also used in aviation where it requires close to real-time responsiveness and in transactions that require few ms per transaction. So your performance requirements are certainly possible to achieve.

                If you don't want a GC, because it consumes more memory, fine, spend your time looking for leaks that other may do using wrongfully your API.

                But both GC and exceptions do warrant some codestyles that are very hard to do in C or C++: database transactions + web services, dynamic webpages, big UI frameworks as Eclipse, huge throughput because it is easy to take advantage of all your cores, dependency injection (how do you annotate a struct in C?)

                Comment


                • anyway
                  gcc still dosent do sse any good

                  gcc:
                  elapsed ticks: 1240594

                  custom loop:
                  elapsed ticks: 970071

                  ran a couple times and took best result
                  (still worst custom is better then best gcc)

                  i unrolled the C loop to be even with assembly
                  if anyone knows a better loop say, the used one is in the tar file down

                  custom loop is in fasm and can be improved still
                  (couple shuffles can be removed, prefetch and non temporal stores for bigger sets of data)

                  test program is half arsed, but is good enough for its purpose

                  DepositFiles provides you with a legitimate technical solution, which enables you to upload, store, access and download text, software, scripts, images, sounds, videos, animations and any other materials in form of one or several electronic files.



                  seems gcc dosent use the whole sse registers, but just the first 32 bits
                  maybe its my fault with compile options
                  the used options were -O3 -msse
                  so if anyone knows better


                  PS avx, fma, xop and such can make the assembly code shorter and bit faster
                  Last edited by gens; 02 May 2013, 09:25 PM.

                  Comment


                  • Originally posted by gens View Post
                    anyway
                    gcc still dosent do sse any good

                    gcc:
                    elapsed ticks: 1240594

                    custom loop:
                    elapsed ticks: 970071

                    ran a couple times and took best result
                    (still worst custom is better then best gcc)

                    i unrolled the C loop to be even with assembly
                    if anyone knows a better loop say, the used one is in the tar file down

                    custom loop is in fasm and can be improved still
                    (couple shuffles can be removed, prefetch and non temporal stores for bigger sets of data)

                    test program is half arsed, but is good enough for its purpose

                    DepositFiles provides you with a legitimate technical solution, which enables you to upload, store, access and download text, software, scripts, images, sounds, videos, animations and any other materials in form of one or several electronic files.



                    seems gcc dosent use the whole sse registers, but just the first 32 bits
                    maybe its my fault with compile options
                    the used options were -O3 -msse
                    so if anyone knows better


                    PS avx, fma, xop and such can make the assembly code shorter and bit faster
                    First of all: your compiler options seem to be wrong. As your machine is 64 bit, it would not matter, but if is 32 bit, it should.

                    SSE2 is in any cpu from Pentium 4 to a today's era. So maybe the option is -msse2. I did not read the assembly, but I think you're using SSE2 for assembly. So it is SSE1 (in GCC) vs SSE2 (in asm).

                    So: the program has many issues, as you can imagine, but I will take minimal facts:
                    - 1240594 / 970071 = 1.28 so we talk here about 28% speedup. If you use multi-core you can get in a dual-core machine close to 100% speedup in this problem, but let's say it will be like 90% speedup, so a dual-core implementation will be like 1.9/1.28 = 1.49, so you will have a 50% faster than assembly code on a dual-core machine, and the speedup will grow as people add many cores to your machine.
                    - you still failed to say: in which case do you need this computation? Why can't be put in the video card with glLoadMatrix if you need eventually to display them, or something of this sort, with 0% CPU (It will have some CPU expense, but it will be optimized by the video drivers or maybe by the video card). If it is a replay or something like this, why not compute timestamp matrices as you create them? This will make the computation tens or hundred of times faster if is done right, and again no assembly is involved
                    - if you talk about a big program (not just a matrix multiply that all data fits in L1 cache) are many more components to take into account. If you talk about a game, excluding video drivers and the physics component is almost never written by you (as game engines license Havok or things of this sort), your +30% speedup in your grand scheme of things means nothing. The same is about some other components that even written in assembly that require a lot of time like: loading the content (long "Loading ..." screens) are IO bound
                    - at last, let's look to some assembly code:
                    Code:
                    movaps xmm1, xmm4		;y23,z23,x31,y31
                    shufps xmm1, xmm5, 01011010b	;x31,x31,x32,x32
                    movaps xmm12, xmm3		;x22,y22,z22,x23
                    shufps xmm12, xmm1, 11001100b	;x22,x23,x31,x32
                    Let's use real-life case of coding when people will copy/paste this code and instead of replacing to "movaps xmm12, xmm3" will simply write: "movaps xmm1, xmm3" (is the previous line, but "1" is not replaced with "12"). How can you debug that!? I know that developers don't use copy/paste you may say, but I have again data to prove you wrong: http://www.viva64.com/en/b/0193/ , static code analysis shows that many opensource projects do have code quality problems at their heart, and this also happens in commercial products too.

                    You can use a slow debuggable version in C that works 10x slower than assembly, but you can identify the problem, and put the -O3 flag, and you're back into "assembly speed land" with no big burden from you, isn't it so?

                    Comment


                    • Originally posted by gens View Post
                      anyway
                      gcc still dosent do sse any good

                      gcc:
                      elapsed ticks: 1240594

                      custom loop:
                      elapsed ticks: 970071

                      ran a couple times and took best result
                      (still worst custom is better then best gcc)

                      i unrolled the C loop to be even with assembly
                      if anyone knows a better loop say, the used one is in the tar file down
                      So, I rerun the tests as you've suggested:
                      Code:
                      $ g++ -O3 matrix_test.c matrixm.c 
                      $ ./a.out 
                      elapsed ticks: [B]745912[/B]
                      And here is the kicker:
                      Code:
                      $ g++ -O3 [B]-flto[/B] matrix_test.c matrixm.c 
                      $ ./a.out 
                      elapsed ticks: [B]647984[/B]
                      I made some changes so your code can compile on my machine, so basically:
                      I changed the external definition in your main program's file as:
                      Code:
                      extern void compute(float *matrix, float *vertex, float *result, int count);
                      I made the function main to return int, and the returned int is zero.

                      Edit: just few times the speed is like 2x faster, so maybe was something like TurboBoost or something on my machine. I updated the numbers. But with -flto (because most likely the function gets inlined), you will see like 15% speedup with no work from your side. This would mean that -flto vs assembly is just 15% slower.

                      And a way to write the loop better (right now I double checked):
                      Code:
                      void compute( float *matrix, float *vertex, float *result, int count ) {
                      	int i=0;
                      	for( i=0;i<count;i++) {
                      		result[0] = matrix[0]*vertex[0] + matrix[1]*vertex[1] +matrix[2]*vertex[2];
                      				
                      		matrix += 3;
                      		result += 1;
                      		vertex += 1;
                      	}
                      }
                      ... and I moved in the main program file (instead of extern definition)

                      Flags
                      Code:
                      g++ -O3 -fwhole-program matrix_test.c
                      elapsed ticks: [B]177472[/B]
                      May you confirm the numbers on your machine?
                      Last edited by ciplogic; 03 May 2013, 04:09 AM. Reason: Numers were different widely on my machine

                      Comment

                      Working...
                      X