Announcement

Collapse
No announcement yet.

Compiler Benchmarks Of GCC, LLVM-GCC, DragonEgg, Clang

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compiler Benchmarks Of GCC, LLVM-GCC, DragonEgg, Clang

    Phoronix: Compiler Benchmarks Of GCC, LLVM-GCC, DragonEgg, Clang

    LLVM 2.8 was released last month with the Clang compiler having feature-complete C++ support, enhancements to the DragonEgg GCC plug-in, a near feature-complete alternative to libstdc++, a drop-in system assembler, ARM code-generation improvements, and many other changes. With there being great interest in the Low-Level Virtual Machine, we have conducted a large LLVM-focused compiler comparison at Phoronix of GCC with versions 4.2.1 through 4.6-20101030, GCC 4.5.1 using the DragonEgg 2.8 plug-in, LLVM-GCC with LLVM 2.8 and GCC 4.2, and lastly with Clang on LLVM 2.8.

    http://www.phoronix.com/vr.php?view=15422

  • Death Knight
    replied
    I believe this test are needed to be rebuild with recent gcc and clang versions like
    gcc4.7 vs clang3.2
    gcc4.8svn vs clang3.3svn
    I wonder the results.

    Leave a comment:


  • XorEaxEax
    replied
    Weird, why would PGO help more with the asm version? Should be the ecaxt opposite imo since the compiler can't optimize that hand-written assembly in any way, but it should atleast be able to do some optimizing with the c code. Anyway I can understand why you wouldn't want to redo the tests since the argument was regarding hand-optimized assembly vs compiler generated code, maybe I'll give it a shot myself since I'm curious as to what Shikari said. Again thanks for the benchmarks!

    Leave a comment:


  • Ranguvar
    replied
    Originally posted by XorEaxEax View Post
    Thanks for the benchmarks, the asm vs compiler generated ratio is pretty much as expected but if I'm reading this correctly the PGO versions are not faster (even slightly slower!?) which means you are not getting it to work properly. You need to run the pgo versions through an encoding and then re-compile for it to be able to use the generated runtime data. As I recall there is a semi-automated framework for this in x264, I'll see if I can find some proper instructions and redo the PGO tests myself (unless you would like to). Even with enabling all assembly optimizations, using PGO gave another 5% performance increase total according to 'Dark Shikari' so PGO isn't working in your tests.
    I talked with Dark Shikari on #x264 about the results. He said A.) PGO would help more with the hand asm, since it apparently does not benefit what the pure C build spends most of its time doing (DSP functions), and B.) I screwed something up with the PGO, because it should be ~1% faster. I did the build correctly from what I can tell (make fprofiled VIDS="videohere.y4m"), but I couldn't be be bothered to recompile, retest, etc. etc. to confirm a ~1% performance increase

    Leave a comment:


  • XorEaxEax
    replied
    Thanks for the benchmarks, the asm vs compiler generated ratio is pretty much as expected but if I'm reading this correctly the PGO versions are not faster (even slightly slower!?) which means you are not getting it to work properly. You need to run the pgo versions through an encoding and then re-compile for it to be able to use the generated runtime data. As I recall there is a semi-automated framework for this in x264, I'll see if I can find some proper instructions and redo the PGO tests myself (unless you would like to). Even with enabling all assembly optimizations, using PGO gave another 5% performance increase total according to 'Dark Shikari' so PGO isn't working in your tests.

    Leave a comment:


  • Ranguvar
    replied
    Testing done.

    Here is the summary: http://ix.io/1h1

    And here are the logfiles: http://ompldr.org/vNmJicA/parkrun_benchmark_logs.tar.gz

    In conclusion, x264's hand-assembly means speeds are increased by 2.4x-5.8x, with a larger improvement when performing more complex encoding.

    There's your evidence. Feel free to perform your own tests.

    Leave a comment:


  • XorEaxEax
    replied
    Please do make that test Ranguvar since I'm interested in seeing the difference in performance. Given that x264 has a 'fprofile' option to compile with it PGO, is there any chance you would do a test with that and '--disable-asm' to see how much it differs from just standard compile with '--disable-asm'.

    Leave a comment:


  • Ranguvar
    replied
    Originally posted by energyman View Post
    so you have no evidence at all?

    anecdotal evidence does not count.

    Besides, which cpu manuifactured in the last 12 years does not have mmx?
    I know that does not count as 'real' evidence. But I don't see you jumping to do a test. Hell, you know what -- you've got me motivated. I'll post back tomorrow or the next day with the results. I'll use the 'parkrun' clip (http://media.xiph.org/video/derf/), constant quality mode with two or three different --preset options, with and without --disable-asm. I use a Q6600 CPU with 6GiB of RAM on Arch Linux, GCC 4.5.1. If that is not to your satisfaction, let me know.

    ARM CPUs, perhaps (which they are slowly adding some assembly for)? I do not see the point of that comment.

    Leave a comment:


  • smitty3268
    replied
    There's no question hand tuned assembly speeds up codecs a lot. All you have to do to test this if you don't believe it is to run the test yourself. I remember a while ago Ubuntu shipped with a mis-configured xvid library, with all the assembly code disabled, and it ran at about 1/3rd the normal speed. Every codec will be different, of course, but given how much work has gone into x264 I would imagine the difference there would be even greater.

    Compilers still aren't very good at utilizing SSE instruction sets automatically, and even if they are they tend to target only a specific instruction set while the hand-tuned code can target SSE4 while still providing fallback code for older CPUs.

    Leave a comment:


  • XorEaxEax
    replied
    Well, although I don't have any actual data to back it up with right now, from experience I will side with Ranguvar on this. Hand optimized assembly done by an expert will generally beat that of an compiler, particularly when it comes to newer cpu extensions like SSE. The x264 devs didn't rewrite a ton of code in assembly just for fun, they benchmarked their assembly code versus compiler output and found that their hand optimized assembly performed alot better.

    However compilers are getting better all the time, and while I think expert hand-tuned assembly will always equal or better that which is compiler generated, there will come a time when the difference is so small that it end up being a waste of time doing it manually.

    The general problem for a compiler when doing optimization is knowledge about the program. A skillful human programmer knows exactly what he is trying to achieve and will be able to make the best possible optimization decisions based upon that. The compiler does not possess that deep knowledge and also can't make assumptions. Some of this can be alleviated through the use of compiler extensions that allow you to give the compiler more detailed instructions then the language normally permits, and also PGO (profile guided optimization) which gives the compiler a ton of runtime data with which it can 'understand' the program better and thus perform better optimizations.

    Leave a comment:

Working...
X