Announcement

Collapse
No announcement yet.

Is Assembly Still Relevant To Most Linux Software?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Definitely a hard topic

    Originally posted by ciplogic View Post
    Thanks for fixing my typo, I will double check in future. I use Wikipedia as it states properly most of the times.

    As C++ is the faster than C as you can use the C subset and have the same (baseline performance), but also other constructs (I'm talking here about templates):
    A Google engineer perspective: http://lingpipe-blog.com/2011/07/01/why-is-c-so-fast/
    This guy also seemed to agree that C++ is faster because it can do better inlining: http://radiospiel.org/sorting-in-c-3...faster-than-c/

    I'm still curios of your classes differences from C++ and C struct (a statement you made )
    Ok, starting with structs and classes: The main difference is that classes in C++ are way more complex than structs in C. This is not a bad thing right away. You can define constructors, destructors and the like, be able to to use templates for flexible typecasting or even overload operators.
    The main issue here is definitely maintainability: You can find many opinions/tests/rants on the Internet claiming one of the languages to be better/faster/stronger/... .
    While on the technical perspective C++ is way more flexible, it remains to be discussed if it is faster. Let's imagine both languages to be equally as fast:

    Given the big flexibility, every C++-programmer has his own coding style. There is no definite way of expressing something in C; you have a big freedom of choice.
    The reason I guess why C is favored in free software projects is the fact, that there are no big gaps when it comes to designing the software.
    10-20 years from now, another programmer will still be able to understand the code, and quite frankly, judging from my own experience, porting C++-code is much more of a pain (especially in Game engines), because of very frequent bad and complex design decisions.

    Nevertheless, you can write great C++ too and horribly design a C program at the same time! It's all about the person writing the code and how the code is structured.
    Especially today's architectures are hard to really "measure". Everything is parallelised and highly dependent on current memory-IO and other small factors. Neither the binary size nor the number of cycles give definite answers about the speed and efficiency of a program. This may be a mystery never to be solved with today's possibilities!

    Comment


    • Originally posted by frign View Post
      Given the big flexibility, every C++-programmer has his own coding style. There is no definite way of expressing something in C; you have a big freedom of choice.
      The reason I guess why C is favored in free software projects is the fact, that there are no big gaps when it comes to designing the software.
      I agree with that point on coding style. You can do OOP, template base, procedural, functional, even event based programming in C++. It's cool, but if you do that in the same project, it will bite you.. C++ may need more discipline, the kind you find more easily in corporate environment.
      Also true is what ciplogic said on binary interface (C defines it, but C++ is implementation dependent).

      But about being faster or slower, I think it boils down to the following:
      - If you misuse C++ features, it will be slower in C++
      - If you try to write yourself in C the features of C++, it will be faster in C++



      Also, source on the the comparison between gcc-c and gcc-cxx on the C and C++ subset.

      Comment


      • Originally posted by frign View Post
        Ok, starting with structs and classes: The main difference is that classes in C++ are way more complex than structs in C. This is not a bad thing right away. You can define constructors, destructors and the like, be able to to use templates for flexible typecasting or even overload operators.
        So, classes (or structs) in C++ are a more complex language constructs. But if you don't use virtual (as you don't write typical pointer to functions by hand either in C) inside structs so often, the main difference seems to be for you the methods, which as in C are not defined in the body of the struct/class, right?

        Originally posted by frign View Post
        The main issue here is definitely maintainability: You can find many opinions/tests/rants on the Internet claiming one of the languages to be better/faster/stronger/... .
        While on the technical perspective C++ is way more flexible, it remains to be discussed if it is faster. Let's imagine both languages to be equally as fast:

        Given the big flexibility, every C++-programmer has his own coding style. There is no definite way of expressing something in C; you have a big freedom of choice.
        The reason I guess why C is favored in free software projects is the fact, that there are no big gaps when it comes to designing the software.
        10-20 years from now, another programmer will still be able to understand the code, and quite frankly, judging from my own experience, porting C++-code is much more of a pain (especially in Game engines), because of very frequent bad and complex design decisions.
        So eventually, the performance (that C++ certainly have accessible some extra features, at least giving hints to compiler, as const references and templates to help compiler to inline your code, in the mean time offering everything that C offers including assembly) is not the reason you think that C over C++ is chosen, right (as your original statement was)?

        It is that is too flexible and as there are many "ways to shot yourself in the foot", makes eventually that C++ to have a "higher level of entry".

        I have been a professional C++ developer and I know the template debugging is really painful (it got better with latest CLang and GCC releases). If you use reference counting, you may have hidden cycles that lead to leaks, if you don't use them at all, you still have chances to leak from other module, etc. All these statements I agree with you as I lived them.

        15 years ago the compilers were not so mature on C++ side. They were good, but not good enough, to say nicely. From a 2006 year, I can say that C++ was good as performance was concerned. With C++ 11, I think that by copy semantics done "automatically", also removed some classes of code inefficiencies.

        Today this argument doesn't hold water as GCC has many complex optimizations under its belt. If you discuss about opensource projects and most likely we discuss about big ones, many of them were started before year 2000 and naturally most of them would use a form of C. There are big projects in C++ that run fast (LibreOffice, Chrome, LLVM, GCC now, Qt).

        Also erendorn confirmed this, in a way he did point that using C++ compiler to compile C code will not give any performance regression, so there is no point to use your C compiler excluding you want to save disk space.

        I can say one thing that I would agree with you: many C++ developers don't have a C mindset thinking that this * operator can make a full multiplication (as is may be hidden by overloading or overriding of the operators that C++ is capable of). In C when you study the language you mostly learn: union types, how types are pushed by pointers or by value, which is a "low level" thinking. This can lead in a way to faster codes, at least for developers that do write code in a rush. In C++ will take 1-2 years to get comfortable with, not just 1-2 months like in C side, so maybe here where the problem lies in your view that C++ is slower. This mentality is also probably why some other people here think that language J is slow. There is a lot of software that is not thought in performance terms (that most C programmers will take it for granted) so maybe this is why that some C++ applications would be better written in C. But in a way, this is why they should be written in C++, but people to be taught only a "fast subset" of it, basically STL, classes where you define a proper copy constructor, use const references where you should, and make your functions const where possible.

        Comment


        • Great!

          Originally posted by ciplogic View Post
          So, classes (or structs) in C++ are a more complex language constructs. But if you don't use virtual (as you don't write typical pointer to functions by hand either in C) inside structs so often, the main difference seems to be for you the methods, which as in C are not defined in the body of the struct/class, right?


          So eventually, the performance (that C++ certainly have accessible some extra features, at least giving hints to compiler, as const references and templates to help compiler to inline your code, in the mean time offering everything that C offers including assembly) is not the reason you think that C over C++ is chosen, right (as your original statement was)?

          It is that is too flexible and as there are many "ways to shot yourself in the foot", makes eventually that C++ to have a "higher level of entry".

          I have been a professional C++ developer and I know the template debugging is really painful (it got better with latest CLang and GCC releases). If you use reference counting, you may have hidden cycles that lead to leaks, if you don't use them at all, you still have chances to leak from other module, etc. All these statements I agree with you as I lived them.

          15 years ago the compilers were not so mature on C++ side. They were good, but not good enough, to say nicely. From a 2006 year, I can say that C++ was good as performance was concerned. With C++ 11, I think that by copy semantics done "automatically", also removed some classes of code inefficiencies.

          Today this argument doesn't hold water as GCC has many complex optimizations under its belt. If you discuss about opensource projects and most likely we discuss about big ones, many of them were started before year 2000 and naturally most of them would use a form of C. There are big projects in C++ that run fast (LibreOffice, Chrome, LLVM, GCC now, Qt).

          Also erendorn confirmed this, in a way he did point that using C++ compiler to compile C code will not give any performance regression, so there is no point to use your C compiler excluding you want to save disk space.

          I can say one thing that I would agree with you: many C++ developers don't have a C mindset thinking that this * operator can make a full multiplication (as is may be hidden by overloading or overriding of the operators that C++ is capable of). In C when you study the language you mostly learn: union types, how types are pushed by pointers or by value, which is a "low level" thinking. This can lead in a way to faster codes, at least for developers that do write code in a rush. In C++ will take 1-2 years to get comfortable with, not just 1-2 months like in C side, so maybe here where the problem lies in your view that C++ is slower. This mentality is also probably why some other people here think that language J is slow. There is a lot of software that is not thought in performance terms (that most C programmers will take it for granted) so maybe this is why that some C++ applications would be better written in C. But in a way, this is why they should be written in C++, but people to be taught only a "fast subset" of it, basically STL, classes where you define a proper copy constructor, use const references where you should, and make your functions const where possible.
          I think these are very smart thoughts! I agree with you.

          In the ideal case, you have a C++ programmer who thinks like a C programmer, but those worlds seem to be quite divided upon religious hatred and obsession for each one's language.
          But for me, C is enough. I know C++, but don't like it's design. And when it doesn't make fun, why even program?

          Comment


          • Originally posted by frign View Post
            I think these are very smart thoughts! I agree with you.
            Thanks!
            Originally posted by frign View Post
            In the ideal case, you have a C++ programmer who thinks like a C programmer, but those worlds seem to be quite divided upon religious hatred and obsession for each one's language.
            But for me, C is enough. I know C++, but don't like it's design. And when it doesn't make fun, why even program?
            As for me, I think that "C thinking" can be hurtful for many classes of problems, but again this is my experience. Also "C++ thinking" can lead to many classes of problems. Or any "language thinking". A language is a tool to express algorithms for your problems. There are languages that do it better or worse, but none as far as I'm concerned are solving all issues at once.

            Just to point some kind of problems I know C doesn't address:
            - A great class of solutions are functional programming (like Lisp, SQL, Haskell) and they can be migrated easily to multi-core and also can avoid many classes of optimization pitfalls.
            - I also like Ruby (even Ruby seems to be fairly slow) as it has a thinking: "convention over configuration" both inside the language and inside it's most popular MVC web framework (Rails)
            - to use a GC: this in many classes of problems reduce the coding and the leaks (I know that some will say that GC still have leaks, which I know, but is less likely to have one). Boehm's GC exists out there, but it has the possibility to leak (as is a conservative GC, any int is considered a possible pointer) and is slow too.
            - C has a fairly poor multi-core experience (also C++) because it doesn't offer immutability warranties. Yes, you can write immutable code, or you can put locks everywhere, but you don't have a language that allows you to set structures immutable (like language D does)
            - static analysis of the code is fairly weak story in C: there is lint tool, and recently there is Clang's analysis component, but C developers mostly don't use these tools and sometimes they have weird behaviours that could be proven by these analyzers that they are faulty. C# has a great story here, C++ has it too with Visual Studio for XBox edition (John Carmack stated this, yet I never used these tools).
            - language cannot offer sometimes even the basic bounds checks (which have a performance cost, but in C++ with STL is an opt-in part for a component that you need safety over performance). Performance is important but security of not having your buffer overflows attacks is also important.
            - to make a DSL like you can with Ruby (aka Monkey patching = to add keywords). You may say that macros can do this, but I kindly disagree: macros can be both hurtful as they can expand the code in a way you don't want, and is really not what I was talking about anyway. Look for tutorials about DSL like this one: Part 1: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby Part 2: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby1

            In a way C gives a better baseline performance, but is hard to make it work in a big codebase to work with C in some specific problem domains. Still C is better than assembly (to be on topic) in most cases and is certainly fast.
            Last edited by ciplogic; 04-11-2013, 02:24 AM.

            Comment


            • Originally posted by frign View Post
              I definitely agree on the last point, but no one stops the compiler from optimizing it accordingly. I knew uint_fast8_t just maps the type to an unsigned char, which makes the code more readable and leaves every distributor of the GNU operating system free to set those typedefs in one location to fix certain architectural quirks.

              Sadly, today's languages encourage to pile up a lot of stuff in memory. C++-classes are one example for an insufficient concept. Even though the compiler does a great job on keeping the code good enough, doing the same in C might have it's advantages (also in regards to inline-assembly). With no doubt, this is a religious question .

              So, it was nice discussing with you! Please let me know which language you prefer.
              Oh lets not start that discussion; it can get way to fierce

              But just to answer your question; I'm a C guy all the way. I've written in C++, in Java and did a few lines of .Net (not voluntarily, and it was power-shell really, but is there a difference?).

              I find C(99) is awesome. It is a nice compromise between assembly and being high and portable. You can do anything you want, wrong or write. It forces you to KNOW what you are doing, which I think is a plus. We are talking coding for fun and love here. Not for $work$ where it has to be done yesterday.

              Java is 'okay' but I don't like it really. I find it way to slow.

              .Net I find a horrible abomination. It's dog slow, ugly contraption that hopefully will die soon.

              Anyway, no flamewar, just sharing my opinion and tastes on the subject

              Comment


              • Originally posted by Obscene_CNN View Post
                Here is a quick small asm hack I made to the radeon r600_shader.c file in mesa. Note it is only two instructions. It sure beats doing a loop with a test in the middle. lets see if you can get a c compiler to beat this .
                }
                Hopefully you'll submit a patch Leave the C code as comment above, add a comment as to why this is faster and why it's good, have a ifdef guard around so it's only enabled when requested somehow and have your name in fame

                Comment


                • Originally posted by oliver View Post
                  Oh lets not start that discussion; it can get way to fierce

                  But just to answer your question; I'm a C guy all the way. I've written in C++, in Java and did a few lines of .Net (not voluntarily, and it was power-shell really, but is there a difference?).

                  I find C(99) is awesome. It is a nice compromise between assembly and being high and portable. You can do anything you want, wrong or write. It forces you to KNOW what you are doing, which I think is a plus. We are talking coding for fun and love here. Not for $work$ where it has to be done yesterday.

                  Java is 'okay' but I don't like it really. I find it way to slow.

                  .Net I find a horrible abomination. It's dog slow, ugly contraption that hopefully will die soon.

                  Anyway, no flamewar, just sharing my opinion and tastes on the subject
                  Without being a flamewar, I would like to justify your fundamentals. I can say that are programs that are C and are 30x slower than C#: http://stackoverflow.com/questions/6...nce-difference , of course of not using correctly C. If C# is an order of magnitude slower than C is mostly the very same cause.

                  Powershell is not .Net any more than saying running scripts in bash is the same as running them in C, as you're running programs that are compiled in C. The single (visible) difference of PowerShell compared with Bash (scripts) is that PS will output .Net objects that are serialized instead of plain text. If you want to compare an interactive C#, try gsharp, or use a C# compiler to compile your C# project before running it.

                  At last, you seem to know to insult everyone, and I think that Java fans out-there would say that are many instances when Java is faster than C: http://scribblethink.org/Computer/javaCbenchmark.html

                  They even try to see why people think that Java is slow, and I think this is a proper explanation:
                  Java program startup is slow. As a java program starts, it unzips the java libraries and compiles parts of itself, so an interactive program can be sluggish for the first couple seconds of use.

                  This approaches being a reasonable explanation for the speed myth. But while it might explain user's impressions, it does not explain why many programmers (who can easily understand the idea of an interpreted program being compiled) share the belief.
                  I understand the idea that C is simple and fancy, but at least take your facts to be grounded in reality.

                  If you are bothered about Java's slow startup or .Net one, both have solutions: Java has GCJ and ExcelsiorJET, as .Net has NGen, or Mono has the AOT compiler (run mono with --aot parameter first). Mono is compiled to very similar code that a C compiler will output on their LLVM solutions (that are Mono4Android and MonoTouch), so this argument of C is tightly optimized looks to me again a bit outdated at least

                  Comment


                  • the main problem with java and c# is the memory consumption. as soon as you try to set a maximum memory usage, the jvm become slow as hell. to be nearly fast as a explicit memory management, the jvm need 3x-4x more memory being de facto not viable on most of embedded systems.

                    Comment


                    • Originally posted by disgrace View Post
                      the main problem with java and c# is the memory consumption. as soon as you try to set a maximum memory usage, the jvm become slow as hell. to be nearly fast as a explicit memory management, the jvm need 3x-4x more memory being de facto not viable on most of embedded systems.
                      Thank you your your input. I agree that this is a reason why NOT to use Java (in most embedded systems), but is not a reason to say that Java is slow, but memory hog. There are C++ programs that require a lot of memory and "are slow as hell" too. GCC with -flto or LibreOffice (or MS Office). As for your point, there is at least one Java (compiled) version for embedded: http://www.excelsiorjet.com/embedded/ (yet is commercial, I'm not related with this company or any products of it).

                      To be on the C++ performance defence, I can say this link also says a similar story: http://www.codeproject.com/Articles/...-Csharp-vs-NET (C# is right for Desktop, Mono - on Windows - is a bit slow, but C# is not so good for ARM as is not well optimized). There it shows some performance pitfalls that C# (or C++'s hash_set) have in their implementation. But as you go directly to conclusions, you may find a quote that may be relevant that C# is slow, is the "mentality":
                      These differences (C# coding style - n. r) can lead to slower programs, but I believe a performance-conscious C# programmer can write programs whose performance is comparable to similarly well-written C++ code. .NET is clearly not as fast as C++ yet, but in principle I see no reason why it couldn't be, given enough TLC from Microsoft or Xamarin.
                      And this is also my experience with C#: for any particular reason, the matrix multiplication (that gens asked to me) was as fast in Mono/Linux as C++ was in Release mode on Windows.

                      But what is definitely wrong is to say wrong facts. And facts becomes meme, even are not related with reality ("Java is slow"). I mean is fair to criticize C++ for compile times, it is fair to say that at least in the past as Mono did not have a generational GC (or Firefox still doesn't it have it today), you can get either bad pauses or you don't have enough throughput of freeing objects (this is the case of Firefox, but they are working on a generational GC: https://bugzilla.mozilla.org/show_bug.cgi?id=619558 ) or as you stated: for constrained memory systems, Java is not the first place to look for.

                      At last, wishing anyone's language/platform to die (including Cobol, Delphi) is not grounded to reality: if there is software and it solves the problem well and even if today there are better solutions, there is no easy migration part (let's say 10.000.000 lines of Cobol to be rewritten in Java), the comments are really no way to improve anyone's feelings, even all agree that Cobol is not such of a good language, or Delphi is slower than C# and is less portable than C# (via Mono). In the same note, every time when a platform dies/leaves for Linux, I myself consider that is a lost opportunity. In the past Ubuntu was criticized that it would basically pack your NVidia drivers. If this would be reverted, it would mean that we are losing one option and it would be hard to workaround it for most users. The same as with drivers, the program languages/platforms (like Java, Mono) are important for Linux as they solve some problems (better) than other languages would do. What we lose when we don't have these languages? Basically some developers that would develop other software on Linux. Even I personally don't like Delphi (and I have to fix sometimes bugs in a Delphi codebase) and FreePascal is slower than Mono (for people that want to compare "native" vs "managed"), I really *love* the idea of Lazarus being packed in Ubuntu repositories: you have a good IDE + Visual designer that doesn't depend on any runtime that in many ways matches Qt. Sure, for performance freaks, they would not use FreePascal, as neither C# developers may not use for the matrix multiply of 3x3 a C# routine, but they would use an external library to do so, but for many particular purposes Lazarus works great. There is even a cool paint program written in it: http://sourceforge.net/projects/lazpaint/ . I think that Pinta (for now) is better, but would you want to not have LazPaint in your distro repository? Because someone dislike Delphi or Lazarus? Even worse, if you are the maintainer of LazPaint, would you enjoy to know that your project was removed, because some crazy guys slander FPC/Lazarus, so it was removed from Ubuntu because of mis-informations or half-facts and factoids?

                      Comment


                      • Originally posted by ciplogic View Post
                        Without being a flamewar, I would like to justify your fundamentals. I can say that are programs that are C and are 30x slower than C#: http://stackoverflow.com/questions/6...nce-difference , of course of not using correctly C. If C# is an order of magnitude slower than C is mostly the very same cause.
                        Of course, bad code is bad code. But an interpreted language will never be faster (when properly written) then a compiled language.
                        Powershell is not .Net any more than saying running scripts in bash is the same as running them in C, as you're running programs that are compiled in C. The single (visible) difference of PowerShell compared with Bash (scripts) is that PS will output .Net objects that are serialized instead of plain text. If you want to compare an interactive C#, try gsharp, or use a C# compiler to compile your C# project before running it.
                        The syntax seemed quite similar, but not that important really; I greatly disliked it
                        At last, you seem to know to insult everyone, and I think that Java fans out-there would say that are many instances when Java is faster than C: http://scribblethink.org/Computer/javaCbenchmark.html
                        They even try to see why people think that Java is slow, and I think this is a proper explanation:
                        I understand the idea that C is simple and fancy, but at least take your facts to be grounded in reality.
                        If you are bothered about Java's slow startup or .Net one, both have solutions: Java has GCJ and ExcelsiorJET, as .Net has NGen, or Mono has the AOT compiler (run mono with --aot parameter first). Mono is compiled to very similar code that a C compiler will output on their LLVM solutions (that are Mono4Android and MonoTouch), so this argument of C is tightly optimized looks to me again a bit outdated at least
                        Well slow startup is one of the most important parts. I have an older android phone with little memory. (Java) apps take forever to start and sometimes run ok, sometimes sluggishly. When an app starts to use a lot of memory, other apps get forcefully stopped, so the whole 'but it stays in memory anyway' doesn't really work

                        But yeah, Java can be fast-ish enough, by using much more memory. I think my _personal preference_ comes from all those fancy features that do things you don't want/expect, allow sloppy work (garbage collection/exception handling). They make things easier, sure but also less in control, or so it feels like anyway.

                        Comment


                        • Originally posted by oliver View Post
                          Of course, bad code is bad code. But an interpreted language will never be faster (when properly written) then a compiled language.
                          Here is why you should use Java and never listen what others say to you. Check your facts! Java is not an interpreted language from 1996. We are in 2013, so you are 17 years outdated. There were (or maybe still are) Java interpreters, but Java as it is, is not interpreted, just some parts of it, but your "hot" code is not. It is compiled. It is like a hybrid car, uses the interpreter for rarely used code, so it uses less memory there, and it compiles hot code, so it runs fast.

                          But if you in fact were talking about other interpreted language (let's say Python), you may be right, but your phrasing didn't seem to have this distinction.

                          Originally posted by oliver View Post
                          The syntax seemed quite similar, but not that important really; I greatly disliked it
                          Which syntax? If bash syntax resemble somewhat the C syntax means that Bash is C? Or an insanity? It is slow, sure, but no one stops you to not use it where you need performance. PowerShell main principle is that you don't get back text, but .Net objects, and applications that are PowerShell aware can get full data or the fields that they need and return back other objects. Instead of parsing yourself the output to get specific fields that some programs return to you in Bash, is better to have objects, and to query them by field. Still, this is not C#, and to make a PowerShell aware you need very little C# or C++.Net kind of coding, the rest you can write it in any language of your choice, including your beloved C.

                          Well slow startup is one of the most important parts. I have an older android phone with little memory. (Java) apps take forever to start and sometimes run ok, sometimes sluggishly. When an app starts to use a lot of memory, other apps get forcefully stopped, so the whole 'but it stays in memory anyway' doesn't really work
                          This is why (again) you should undestand that languages and implementations are different things. Android uses Java language but is kind of a different machine. It improved slightly, but is still not as fast as Mono For Android (which is in C++/C performance range). If you will have 1 hour of your life, you can understand that Android's VM is not tuned for ultimate performance: http://www.youtube.com/watch?v=Ls0tM-c4Vfo

                          But yeah, Java can be fast-ish enough, by using much more memory. I think my _personal preference_ comes from all those fancy features that do things you don't want/expect, allow sloppy work (garbage collection/exception handling). They make things easier, sure but also less in control, or so it feels like anyway.
                          As Java is not an interpreter (please again read from other sources than your C blogger that you probably follow), your slow applications that you match them with Java are more likely based on memes or not understanding what your application does. You can tune that Java will do GC in limited time slices from at least 3 years ago (read G1GC documentation). Java is also used in aviation where it requires close to real-time responsiveness and in transactions that require few ms per transaction. So your performance requirements are certainly possible to achieve.

                          If you don't want a GC, because it consumes more memory, fine, spend your time looking for leaks that other may do using wrongfully your API.

                          But both GC and exceptions do warrant some codestyles that are very hard to do in C or C++: database transactions + web services, dynamic webpages, big UI frameworks as Eclipse, huge throughput because it is easy to take advantage of all your cores, dependency injection (how do you annotate a struct in C?)

                          Comment


                          • anyway
                            gcc still dosent do sse any good

                            gcc:
                            elapsed ticks: 1240594

                            custom loop:
                            elapsed ticks: 970071

                            ran a couple times and took best result
                            (still worst custom is better then best gcc)

                            i unrolled the C loop to be even with assembly
                            if anyone knows a better loop say, the used one is in the tar file down

                            custom loop is in fasm and can be improved still
                            (couple shuffles can be removed, prefetch and non temporal stores for bigger sets of data)

                            test program is half arsed, but is good enough for its purpose

                            http://depositfiles.com/files/b0qafl76c


                            seems gcc dosent use the whole sse registers, but just the first 32 bits
                            maybe its my fault with compile options
                            the used options were -O3 -msse
                            so if anyone knows better


                            PS avx, fma, xop and such can make the assembly code shorter and bit faster
                            Last edited by gens; 05-02-2013, 09:25 PM.

                            Comment


                            • Originally posted by gens View Post
                              anyway
                              gcc still dosent do sse any good

                              gcc:
                              elapsed ticks: 1240594

                              custom loop:
                              elapsed ticks: 970071

                              ran a couple times and took best result
                              (still worst custom is better then best gcc)

                              i unrolled the C loop to be even with assembly
                              if anyone knows a better loop say, the used one is in the tar file down

                              custom loop is in fasm and can be improved still
                              (couple shuffles can be removed, prefetch and non temporal stores for bigger sets of data)

                              test program is half arsed, but is good enough for its purpose

                              http://depositfiles.com/files/b0qafl76c


                              seems gcc dosent use the whole sse registers, but just the first 32 bits
                              maybe its my fault with compile options
                              the used options were -O3 -msse
                              so if anyone knows better


                              PS avx, fma, xop and such can make the assembly code shorter and bit faster
                              First of all: your compiler options seem to be wrong. As your machine is 64 bit, it would not matter, but if is 32 bit, it should.

                              SSE2 is in any cpu from Pentium 4 to a today's era. So maybe the option is -msse2. I did not read the assembly, but I think you're using SSE2 for assembly. So it is SSE1 (in GCC) vs SSE2 (in asm).

                              So: the program has many issues, as you can imagine, but I will take minimal facts:
                              - 1240594 / 970071 = 1.28 so we talk here about 28% speedup. If you use multi-core you can get in a dual-core machine close to 100% speedup in this problem, but let's say it will be like 90% speedup, so a dual-core implementation will be like 1.9/1.28 = 1.49, so you will have a 50% faster than assembly code on a dual-core machine, and the speedup will grow as people add many cores to your machine.
                              - you still failed to say: in which case do you need this computation? Why can't be put in the video card with glLoadMatrix if you need eventually to display them, or something of this sort, with 0% CPU (It will have some CPU expense, but it will be optimized by the video drivers or maybe by the video card). If it is a replay or something like this, why not compute timestamp matrices as you create them? This will make the computation tens or hundred of times faster if is done right, and again no assembly is involved
                              - if you talk about a big program (not just a matrix multiply that all data fits in L1 cache) are many more components to take into account. If you talk about a game, excluding video drivers and the physics component is almost never written by you (as game engines license Havok or things of this sort), your +30% speedup in your grand scheme of things means nothing. The same is about some other components that even written in assembly that require a lot of time like: loading the content (long "Loading ..." screens) are IO bound
                              - at last, let's look to some assembly code:
                              Code:
                              movaps xmm1, xmm4		;y23,z23,x31,y31
                              shufps xmm1, xmm5, 01011010b	;x31,x31,x32,x32
                              movaps xmm12, xmm3		;x22,y22,z22,x23
                              shufps xmm12, xmm1, 11001100b	;x22,x23,x31,x32
                              Let's use real-life case of coding when people will copy/paste this code and instead of replacing to "movaps xmm12, xmm3" will simply write: "movaps xmm1, xmm3" (is the previous line, but "1" is not replaced with "12"). How can you debug that!? I know that developers don't use copy/paste you may say, but I have again data to prove you wrong: http://www.viva64.com/en/b/0193/ , static code analysis shows that many opensource projects do have code quality problems at their heart, and this also happens in commercial products too.

                              You can use a slow debuggable version in C that works 10x slower than assembly, but you can identify the problem, and put the -O3 flag, and you're back into "assembly speed land" with no big burden from you, isn't it so?

                              Comment


                              • Originally posted by gens View Post
                                anyway
                                gcc still dosent do sse any good

                                gcc:
                                elapsed ticks: 1240594

                                custom loop:
                                elapsed ticks: 970071

                                ran a couple times and took best result
                                (still worst custom is better then best gcc)

                                i unrolled the C loop to be even with assembly
                                if anyone knows a better loop say, the used one is in the tar file down
                                So, I rerun the tests as you've suggested:
                                Code:
                                $ g++ -O3 matrix_test.c matrixm.c 
                                $ ./a.out 
                                elapsed ticks: 745912
                                And here is the kicker:
                                Code:
                                $ g++ -O3 -flto matrix_test.c matrixm.c 
                                $ ./a.out 
                                elapsed ticks: 647984
                                I made some changes so your code can compile on my machine, so basically:
                                I changed the external definition in your main program's file as:
                                Code:
                                extern void compute(float *matrix, float *vertex, float *result, int count);
                                I made the function main to return int, and the returned int is zero.

                                Edit: just few times the speed is like 2x faster, so maybe was something like TurboBoost or something on my machine. I updated the numbers. But with -flto (because most likely the function gets inlined), you will see like 15% speedup with no work from your side. This would mean that -flto vs assembly is just 15% slower.

                                And a way to write the loop better (right now I double checked):
                                Code:
                                void compute( float *matrix, float *vertex, float *result, int count ) {
                                	int i=0;
                                	for( i=0;i<count;i++) {
                                		result[0] = matrix[0]*vertex[0] + matrix[1]*vertex[1] +matrix[2]*vertex[2];
                                				
                                		matrix += 3;
                                		result += 1;
                                		vertex += 1;
                                	}
                                }
                                ... and I moved in the main program file (instead of extern definition)

                                Flags
                                Code:
                                g++ -O3 -fwhole-program matrix_test.c
                                elapsed ticks: 177472
                                May you confirm the numbers on your machine?
                                Last edited by ciplogic; 05-03-2013, 04:09 AM. Reason: Numers were different widely on my machine

                                Comment

                                Working...
                                X