AMD Piledriver/Trinity A10-5800K Compiler Tuning
Phoronix: AMD Piledriver/Trinity A10-5800K Compiler Tuning
With the initial Linux results for the AMD A10-5800K Trinity APU now out of the way along with the Radeon HD 7660D graphics performance, in this article are some benchmarks looking at the impact of compiler tuning for the Piledriver cores using the common GCC compiler and testing different CPU micro-architecture targets.
Does anyone know how many of the programs tested in the article use inline assembly? I'm not horribly familiar with any of them. That would surely taint the meaningfullness of testing different compiler switches--when the core of the code is an 'as given' assembly blob.
i would like to see a -march=native flag in there. it would show if gcc is correctly detecting the CPU, but would also show if the addition information about cache size and layout helped.
it would be good if developers tried harder to help the compiler auto vectorise, rather than putting in their own assembly. that way the code would automatically benefit on new architectures. http://locklessinc.com/articles/vectorize/ has some examples of hind that can be given.
Are there any actual software that can actually positively use FMA3 ? AFAIK, scientific software can possibly use FMA3, but i havent seen any real world example.
Any time you do A=A+B*C, you can benefit from FMA3. That's pretty common in any matrix math--which is used heavily in graphics as well. All FFTs can benefit from FMA3 as well. I wonder if any of these programs link to libraries that could benefit from FMA3. We might not really be seeing the full effect of these different compiler settings if the libraries aren't making use of it as well.
Originally Posted by mayankleoboy1
Someone correct me if I'm wrong, but x86, SSE, and AVX all have separate registers, right? So, any code mixing SSE (say, from a library) and AVX (from the calling program) will hit a register copy penalty.
As a simple user, how can I relate what I see in this benchmark to a common distribution -- e.g. the latest Ubuntu? Are the Ubuntu binaries built with any of the benchmarked CPU targets? Since I'm planning to buy an A10 5800 the exact same day when it will arrive in my town, it would be interesting to know this to understand what difference it would make if I could compile the binaries for my (future) CPU.
Ubuntu binaries are compiled to run on most processors, pretty generic stuff, with generic optimisations. If you want system-wide improvements, you'll have to compile your own system, Gentoo or Arch style, but in reality, the practical gains of doing this are moderate.
What you CAN do is compile specific software that you need to optimize, like your scientific software, or video encoder or something similar that's processor-intensive. This is really worth doing.
most binary distros are quite conservative with build options, so wont turn on most of these optimisations. the 64bit editions will generally run on any x86-64 CPU, so the most you can assume is SSE2. for debian based systems there is apt-build that can rebuild packages.
Originally Posted by geamandura
if you are interested in rebuilding lots of packages, then you might want to look at gentoo (or a derivative). but be aware that gentoo stable currently has GCC 4.5, and only 4.6 in unstable. gcc 4.7 is hard masked ( http://packages.gentoo.org/package/sys-devel/gcc ).
Good news for Gentoo users I think.
And it is interesting to see that it help pretty much on most scenarios while some others don't seem to be influenced.
@Michael: How about comparing these results with the results of a run with -mtune=generic, which is uses in standard distributions? That way one might get a glimpse how well the bulldozer will perform there.