1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems


Facebook RSS Twitter Twitter Google Plus


Phoronix Test Suite

OpenBenchmarking.org

AMD Drops Steamroller "bdver3" Compiler Support

AMD

Published on 11 October 2012 10:56 AM EDT
Written by Michael Larabel in AMD
8 Comments

Second-generation Bulldozer processors only started appearing recently in the form of the Trinity APUs with Piledriver cores. The next Bulldozer-2 wave will come when AMD releases their Piledriver-bearing "Vishera" FX-Series desktop processors. While this hardware has yet to publicly arrive, AMD is are already working on compiler support for their third-generation Bulldozer -- a.k.a. "Steamroller" -- micro-architecture.

As first spotted via my Anzwix system, Ganesh Gopalasubramanian of AMD published the first patch work concerning "bdver3" enablement to gcc-patches on Thursday morning. "The attached patch (Patch.txt) enables the next version of AMD's bulldozer core. A new file (bdver3.md) is also attached which describes the pipelines."

As with previous generations, this compiler tuning/optimization support for next year's AMD hardware is being called "bdver3" in a similar naming convention to the original Bulldozer CPUs ("bdver1") and the new Piledriver / Bulldozer 2 support ("bdver3"). The bdver3 GCC patch in its early form copies most of the tuning work from bdver2 but the pipelines have already been modelled after the new Steamroller core.

The bdver2 support within this leading open-source code compiler super-set the following instruction set extensions: BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, and ABM. The current bdver3 family support is also setting these same instruction set extensions without anything new for the moment, aside from the pipeline remodelling.

Here's some code comments from within the patch that explain a little about how the Bulldozer v3 scheduling is done:
The bdver3 contains three pipelined FP units and two integer units. Fetching and decoding logic is different from previous fam15 processors. Fetching is done every two cycles rather than every cycle and two decode units are available. The decode units therefore decode four instructions in two cycles.

Three DirectPath instructions decoders and only one VectorPath decoder is available. They can decode three DirectPath instructions or one VectorPath instruction per cycle.

The load/store queue unit is not attached to the schedulers but communicates with all the execution units separately instead.

bdver3 belong to fam15 processors. We use the same insn attribute that was used for bdver3 decoding scheme.
The only other potentially interesting comments out of this early Steamroller compiler patch is "New AMD processors never drop prefetches; if they cannot be performed immediately, they are queued. We set number of simultaneous prefetches to a large constant to reflect this (it probably is not a good idea not to limit number of prefetches at all, as their execution also takes some time)." Additionally, "BDVER3 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop."

The way that the new compiler code is determining a "bdver3" processor rather than a previous-generation Bulldozer is based upon the AMD APU/CPU having xsaveopt. The xsaveopt instruction is part of AVX (Advanced Vector Extensions) and is an optimized extended state save instruction similar to xsave. With bdver3 is apparently the first time this xsaveopt instruction is being supported by AMD processors.

Gopalasubramanian is trying to push the initial bdver3 work already upstream, which means the first AMD Steamroller support would make it into the GCC 4.8 release. GCC 4.8 should be released in H1'2013 and hopefully by then we will see more mature bdver3 compiler optimization support. Patches for LLVM/Clang for bdver3 will also likely soon emerge.

Last week was when the binutils support for bdver3 appeared on that project's list. "This patch adds basic support for AMD's bdver3 core. I have tested on x86_64-unknown-linux-gnu and noted no regressions. Please review it and let me know if it is OK to commit to trunk."

The next-generation Bulldozer is expected for release in 2013. Steamroller-based products are said to be manufactured on a 28nm process and will focus upon greater parallelism compared to current AMD products. Hopefully it will not be too far out until we begin seeing performance benchmarks of AMD Steamroller engineering samples on OpenBenchmarking.org via the Phoronix Test Suite automated benchmarking software. With our open-source benchmarking platform, we happened to spot results almost one year early with Trinity, several months early with Interlagos, and other next-generation Intel/AMD/VIA CPUs from those early test engineers that are overly-excited to publicly share their data or are just too careless for uploading the data to a public window.

Intel's 2013 micro-architecture, Haswell, has already been seeing compiler tuning and optimization work going back many months with the GCC compiler stack. The Haswell support is now in great shape for the compiler as well as the open-source Linux drivers.

For those wondering about the performance impact of generating binaries specifically for "bdver2" rather than generic x86_64, I do have benchmarks that I recently did from the A10-5800K Trinity that look at the compiler tuning and optimization impact. There will also be AMD Vishera Linux benchmarks and other analysis whenever those processors surface. The A10-5800K bdver2 compiler tuning results will be published on Phoronix in the coming days. Also forthcoming is a bdver2 compiler comparison between various releases of GCC, LLVM/Clang, and AMD's Open64 compiler.

About The Author
Michael Larabel is the principal author of Phoronix.com and founded the web-site in 2004 with a focus on enriching the Linux hardware experience and being the largest web-site devoted to Linux hardware reviews, particularly for products relevant to Linux gamers and enthusiasts but also commonly reviewing servers/workstations and embedded Linux devices. Michael has written more than 10,000 articles covering the state of Linux hardware support, Linux performance, graphics hardware drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated testing software. He can be followed via and or contacted via .
Latest Linux Hardware Reviews
  1. Trying The Configurable 45 Watt TDP With AMD's A10-7800 / A6-7400K
  2. Sumo's Omni Gets Reloaded
  3. AMD A10-7800 & A6-7400K APUs Run Great On Linux
  4. Radeon Gallium3D Is Running Increasingly Well Against AMD's Catalyst Driver
Latest Linux Articles
  1. AMD's RadeonSI Driver Sped Up A Lot This Summer
  2. Intel's Latest Linux Graphics Code Competes Against OS X 10.9
  3. Intel Sandy Bridge Gets A Surprise Boost From Linux 3.17
  4. Open-Source Radeon Graphics Have Some Improvements On Linux 3.17
Latest Linux News
  1. Intel Bay Trail Performance With Linux 3.16/3.17 & Mesa 10.3
  2. EFL Sees A Ton Of Work Following Recent v1.11 Release
  3. ARM Talks Up Wayland For Mali
  4. GNOME/GTK+ Human Interface Guidelines Updated
  5. Robocraft Is Rolling Over To Linux
  6. The Widely-Criticized New Commercial Linux Distro Is Now On Kickstarter
  7. Wayland & Weston 1.6 Alpha Released
  8. A New First-Person Mystery Game Might Be Coming To Linux
  9. Patch By Patch, LLVM Clang Gets Better At Building The Linux Kernel
  10. VC4 Gallium3D Driver Now Handles X With GLAMOR
Latest Forum Discussions
  1. OSS radeon driver for A10-7850K (Kaveri)
  2. Btrfs Gets Talked Up, Googler Encourages You To Try Btrfs
  3. Systemd 216 Piles On More Features, Aims For New User-Space VT
  4. American Citizens running AMOK for food stamps
  5. What Linux Distribution Should Be Benchmarked The Most?
  6. Company I work for is looking to contribute to Open Source projects... but wrongly?
  7. Microsoft vs. Campaign
  8. Updated and Optimized Ubuntu Free Graphics Drivers