If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
@ Do I have the results? No, The experiments were back during the gentoo 1.4 days (GCC 3.X, ICC7, probably back in the 2007-ish time frame?). You might find partial results on the gentoo forums if they have posts from back then.
@ Do I remember which binaries? FLAC, FAAC, imageMagic, LAME and Mencoder (with supporting libraries).
@ Do I remember which flags worked best? No, and even if I did, they would probably not apply on modern systems.
@ Did you use the super duper new tuner/profiler? No, as I do not believe that it existed at the time. If it did, I was completely unaware of it.
What I did was nothing special. I created an array of cflags and then walked through the combinations, running a time'd benchmark each iteration. It was honestly 3 lines of bash for-loop-foo per target app and a single file containing comma delimited cflags. Gentoo made it easy as the build system was already set up.
The biggest reason why I scratched it was that I would end up with a working FLAC binary, but random apps that linked to libflac.so would bomb. At that point, it seemed that it really wasn't important enough to me to invest additional time writing automated tests for every app that linked each library. In addition, I had already found an alternate solution to all of my issues.
The -march=corei7-avx option is most appropriate for Sandy Bridge since it enables the Advanced Vector Extensions support as well as the AES and PCLMUL instruction sets for Sandy Bridge. Here's the overview from the GCC i386/x86_64 options page:
Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.
Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
Basically only Gentoo/Arch users can who compile all the day can use different compilers/settings to improve speed. But they will never gain the time back they used to compile the sources on their own system(s). If you use a more generic distro then all packages have to be compiled to work on all supported systems. I don't think a 5%-10% gain is worth to create a specific binary, that's only important when the base speed is low which is very unlikely if you own a new system. A completely different thing is when you have to your own code and you want to run it as fast as possible - but then you have to do you own tests as no compiler comparsion will be accurate for custom code. I compile xbmc from sources usually but the reason is not the speed but that this app gets so many updates in a short time that using binaries from a release feels already outdated when it is tagged
-march implies -mtune. With march set to your cpu, it's pointless to add mtune.
but I read:
Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. While picking a specific cpu-type schedules things appropriately for that particular chip, the compiler does not generate any code that cannot run on the default machine type unless you use a -march=cpu-type option. For example, if GCC is configured for i686-pc-linux-gnu then -mtune=pentium4 generates code that is tuned for Pentium 4 but still runs on i686 machines.
The choices for cpu-type are the same as for -march.
OK, I guess I got it backwards. march does more than mtune.