AMD Piledriver/Trinity A10-5800K Compiler Tuning
With the initial Linux results for the AMD A10-5800K Trinity APU now out of the way along with the Radeon HD 7660D graphics performance, in this article are some benchmarks looking at the impact of compiler tuning for the Piledriver cores using the common GCC compiler and testing different CPU micro-architecture targets.
For a variety of popular open-source computational benchmarks within the Phoronix Test Suite, the tests were built with the -march=k8, -march=k8-sse3, -march=barcelona, -march=bdver1, and -march=bdver2 compiler flags. The -march=bdver2 target is the native Bulldozer 2 CPU target for the AMD A10-5800K Trinity APU. Below are descriptions of the different x86_64 instruction set extensions by the "march" targets.
K8 - Processors based on the AMD K8 core with x86-64 instruction set support, including the AMD Opteron, Athlon 64, and Athlon 64 FX processors. (This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow! and 64-bit instruction set extensions.)
K8-SSE3 - Improved versions of AMD K8 cores with SSE3 instruction set support.
BARCELONA - CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit instruction set extensions.)
BDVER1 - CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
BDVER2 - AMD Family 15h core based CPUs with x86-64 instruction set support. (This supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
This testing is similar to the AMD Bulldozer compiler tuning but now with the "bdver2" Piledriver CPU support being possible with the A10-5800K. Bulldozer 2 adds in support for the BMI, TBM, and F16C instruction set extensions. FMA3 of the Fused Multiply-Add instruction set was also added in for Piledriver compared to just the FMA4 variant with the original Bulldozer.
FMA3 is a three operand variant (that's being pushed by Intel with Haswell) of Fused Multiply-Add rather than the four operand version, F16C allows for converting and storing 32-bit floating point values using 16-bits, TBM is Trailing Bit Manipulation, and BMI is Bit Manipulation Instructions.
Initial support for AMD Piledriver / Bulldozer-2 "bdver2" processors was introduced back in March with the GCC 4.7 compiler release. GCC 4.7.2 was the compiler being used for this round of Linux compiler benchmarking. As mentioned recently, AMD has already provided Steamroller "bdver3" compiler support for GCC 4.8 to be released next year.