With this week's unveiling of the FX-8350 eight-core processor being based on AMD's new Piledriver architecture, in this article are benchmarks when testing out the Piledriver "bdver2" optimizations within AMD's own Open64 compiler.
Back in August was when AMD released the AMD Open64 4.5.2 compiler, which introduced support for Family 15h Piledriver cores. With the Piledriver support came work within AMD's Open64 compiler fork for handling AVX, XOP, FMA3, FMA4, BMI, TBM, and F16C instruction sets.
From the features list for the Open64 4.5.2 release, "You can enable this new instructions in Piledriver core using the -march=bdver2. Alternately you can pick and choose the ISA to be enabled using -mfma (for FMA3), -mfma4, -mbmi, -mtbm flags." The "bdver2" target is the same as is used by GCC and LLVM/Clang for supporting the second-generation Bulldozer -- a.k.a. Piledriver -- AMD processors.
Earlier this month I did do some GCC compiler tuning tests for the Piledriver-based AMD A10-5800K Trinity APU. With the GCC testing of comparing the bdver2 micro-architecture support against bdver1 and other earlier AMD CPU targets, there wasn't much improvement out of using "-march=bdver2" when compiling the test application / benchmark binaries.
In the GCC Piledriver tuning tests from last week with GCC 4.7.2 I went over what the bdver2 target adds: BMI, TBM, F16C, and FMA3. FMA3 is a three operand variant (that's being pushed by Intel with Haswell) of Fused Multiply-Add rather than the four operand version, F16C allows for converting and storing 32-bit floating point values using 16-bits, TBM is Trailing Bit Manipulation, and BMI is Bit Manipulation Instructions.
From the AMD FX-8350 Eight-Core "Vishera" setup running Ubuntu 12.10 with the Linux 3.5 kernel, AMD's official 64-bit Open64 4.5.2 compiler was tested. Each time this compiler built the Phoronix Test Suite collection of tests while passing each time a different "-march=" value of k8, barcelona, bdver1, and then bdver2.