Announcement

Collapse
No announcement yet.

Intel Core i7 AVX GCC Compiler Tuning Results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by WorBlux View Post
    Did you try profile guided optimization? My guess it that may be the easiest way to get the best binary without resorting to potentially dangerous flags.
    PGO is great. Gives me an easy 5-10% performance boost when optimising wined3d (can't profile the entire wine app because of a bug). PGO works great with Dolphin as well.

    Comment


    • #12
      Smallpt didnt use any SIMD instructions

      The real advantage of AVX is the wide of the registers. Its 256 bit and SmallPT didnt use SIMD.
      So, for a real good review, you have to put some programs that use SIMD power.

      Comment


      • #13
        @ Do I have the results? No, The experiments were back during the gentoo 1.4 days (GCC 3.X, ICC7, probably back in the 2007-ish time frame?). You might find partial results on the gentoo forums if they have posts from back then.
        @ Do I remember which binaries? FLAC, FAAC, imageMagic, LAME and Mencoder (with supporting libraries).
        @ Do I remember which flags worked best? No, and even if I did, they would probably not apply on modern systems.
        @ Did you use the super duper new tuner/profiler? No, as I do not believe that it existed at the time. If it did, I was completely unaware of it.

        What I did was nothing special. I created an array of cflags and then walked through the combinations, running a time'd benchmark each iteration. It was honestly 3 lines of bash for-loop-foo per target app and a single file containing comma delimited cflags. Gentoo made it easy as the build system was already set up.

        The biggest reason why I scratched it was that I would end up with a working FLAC binary, but random apps that linked to libflac.so would bomb. At that point, it seemed that it really wasn't important enough to me to invest additional time writing automated tests for every app that linked each library. In addition, I had already found an alternate solution to all of my issues.

        F

        Comment


        • #14
          Originally posted by AnonymousCoward View Post
          PGO is great. Gives me an easy 5-10% performance boost when optimising wined3d (can't profile the entire wine app because of a bug). PGO works great with Dolphin as well.
          Hi,

          how did you get this going with wined3d only ?
          Do you have some kind of howto ?

          I'm very interested in +5-10% pef increease in the wine d3d area.
          Im nowadays compiling wine with march=native but that doesnt give that much of an fps increase.


          Many thanks,
          Christian

          Comment


          • #15
            The article mentioned:

            The -march=corei7-avx option is most appropriate for Sandy Bridge since it enables the Advanced Vector Extensions support as well as the AES and PCLMUL instruction sets for Sandy Bridge. Here's the overview from the GCC i386/x86_64 options page:
            `core2'
            Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
            `corei7'
            Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.
            `corei7-avx'
            Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
            What about -mtune ? Was that used as well?

            Comment


            • #16
              -march implies -mtune. With march set to your cpu, it's pointless to add mtune.

              Comment


              • #17
                Basically only Gentoo/Arch users can who compile all the day can use different compilers/settings to improve speed. But they will never gain the time back they used to compile the sources on their own system(s). If you use a more generic distro then all packages have to be compiled to work on all supported systems. I don't think a 5%-10% gain is worth to create a specific binary, that's only important when the base speed is low which is very unlikely if you own a new system. A completely different thing is when you have to your own code and you want to run it as fast as possible - but then you have to do you own tests as no compiler comparsion will be accurate for custom code. I compile xbmc from sources usually but the reason is not the speed but that this app gets so many updates in a short time that using binaries from a release feels already outdated when it is tagged

                Comment


                • #18
                  Originally posted by curaga View Post
                  -march implies -mtune. With march set to your cpu, it's pointless to add mtune.
                  but I read:

                  -mtune=cpu-type
                  Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. While picking a specific cpu-type schedules things appropriately for that particular chip, the compiler does not generate any code that cannot run on the default machine type unless you use a -march=cpu-type option. For example, if GCC is configured for i686-pc-linux-gnu then -mtune=pentium4 generates code that is tuned for Pentium 4 but still runs on i686 machines.
                  The choices for cpu-type are the same as for -march.
                  OK, I guess I got it backwards. march does more than mtune.

                  Comment


                  • #19
                    Originally posted by AnonymousCoward View Post
                    PGO is great. Gives me an easy 5-10% performance boost when optimising wined3d (can't profile the entire wine app because of a bug). PGO works great with Dolphin as well.
                    How you done this?

                    Comment

                    Working...
                    X