1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems

Facebook RSS Twitter Twitter Google Plus

Phoronix Test Suite


AMD Drops Steamroller "bdver3" Compiler Support


Published on 11 October 2012 10:56 AM EDT
Written by Michael Larabel in AMD

Second-generation Bulldozer processors only started appearing recently in the form of the Trinity APUs with Piledriver cores. The next Bulldozer-2 wave will come when AMD releases their Piledriver-bearing "Vishera" FX-Series desktop processors. While this hardware has yet to publicly arrive, AMD is are already working on compiler support for their third-generation Bulldozer -- a.k.a. "Steamroller" -- micro-architecture.

As first spotted via my Anzwix system, Ganesh Gopalasubramanian of AMD published the first patch work concerning "bdver3" enablement to gcc-patches on Thursday morning. "The attached patch (Patch.txt) enables the next version of AMD's bulldozer core. A new file (bdver3.md) is also attached which describes the pipelines."

As with previous generations, this compiler tuning/optimization support for next year's AMD hardware is being called "bdver3" in a similar naming convention to the original Bulldozer CPUs ("bdver1") and the new Piledriver / Bulldozer 2 support ("bdver3"). The bdver3 GCC patch in its early form copies most of the tuning work from bdver2 but the pipelines have already been modelled after the new Steamroller core.

The bdver2 support within this leading open-source code compiler super-set the following instruction set extensions: BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, and ABM. The current bdver3 family support is also setting these same instruction set extensions without anything new for the moment, aside from the pipeline remodelling.

Here's some code comments from within the patch that explain a little about how the Bulldozer v3 scheduling is done:
The bdver3 contains three pipelined FP units and two integer units. Fetching and decoding logic is different from previous fam15 processors. Fetching is done every two cycles rather than every cycle and two decode units are available. The decode units therefore decode four instructions in two cycles.

Three DirectPath instructions decoders and only one VectorPath decoder is available. They can decode three DirectPath instructions or one VectorPath instruction per cycle.

The load/store queue unit is not attached to the schedulers but communicates with all the execution units separately instead.

bdver3 belong to fam15 processors. We use the same insn attribute that was used for bdver3 decoding scheme.
The only other potentially interesting comments out of this early Steamroller compiler patch is "New AMD processors never drop prefetches; if they cannot be performed immediately, they are queued. We set number of simultaneous prefetches to a large constant to reflect this (it probably is not a good idea not to limit number of prefetches at all, as their execution also takes some time)." Additionally, "BDVER3 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop."

The way that the new compiler code is determining a "bdver3" processor rather than a previous-generation Bulldozer is based upon the AMD APU/CPU having xsaveopt. The xsaveopt instruction is part of AVX (Advanced Vector Extensions) and is an optimized extended state save instruction similar to xsave. With bdver3 is apparently the first time this xsaveopt instruction is being supported by AMD processors.

Gopalasubramanian is trying to push the initial bdver3 work already upstream, which means the first AMD Steamroller support would make it into the GCC 4.8 release. GCC 4.8 should be released in H1'2013 and hopefully by then we will see more mature bdver3 compiler optimization support. Patches for LLVM/Clang for bdver3 will also likely soon emerge.

Last week was when the binutils support for bdver3 appeared on that project's list. "This patch adds basic support for AMD's bdver3 core. I have tested on x86_64-unknown-linux-gnu and noted no regressions. Please review it and let me know if it is OK to commit to trunk."

The next-generation Bulldozer is expected for release in 2013. Steamroller-based products are said to be manufactured on a 28nm process and will focus upon greater parallelism compared to current AMD products. Hopefully it will not be too far out until we begin seeing performance benchmarks of AMD Steamroller engineering samples on OpenBenchmarking.org via the Phoronix Test Suite automated benchmarking software. With our open-source benchmarking platform, we happened to spot results almost one year early with Trinity, several months early with Interlagos, and other next-generation Intel/AMD/VIA CPUs from those early test engineers that are overly-excited to publicly share their data or are just too careless for uploading the data to a public window.

Intel's 2013 micro-architecture, Haswell, has already been seeing compiler tuning and optimization work going back many months with the GCC compiler stack. The Haswell support is now in great shape for the compiler as well as the open-source Linux drivers.

For those wondering about the performance impact of generating binaries specifically for "bdver2" rather than generic x86_64, I do have benchmarks that I recently did from the A10-5800K Trinity that look at the compiler tuning and optimization impact. There will also be AMD Vishera Linux benchmarks and other analysis whenever those processors surface. The A10-5800K bdver2 compiler tuning results will be published on Phoronix in the coming days. Also forthcoming is a bdver2 compiler comparison between various releases of GCC, LLVM/Clang, and AMD's Open64 compiler.

About The Author
Michael Larabel is the principal author of Phoronix.com and founded the web-site in 2004 with a focus on enriching the Linux hardware experience and being the largest web-site devoted to Linux hardware reviews, particularly for products relevant to Linux gamers and enthusiasts but also commonly reviewing servers/workstations and embedded Linux devices. Michael has written more than 10,000 articles covering the state of Linux hardware support, Linux performance, graphics hardware drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated testing software. He can be followed via and or contacted via .
Latest Linux Hardware Reviews
  1. MSI X99S SLI PLUS On Linux
  2. NVIDIA GeForce GTX 970 Offers Great Linux Performance
  3. CompuLab Intense-PC2: An Excellent, Fanless, Mini PC Powered By Intel's i7 Haswell
  4. From The Atom 330 To Haswell ULT: Intel Linux Performance Benchmarks
Latest Linux Articles
  1. Open-Source Radeon 2D Performance Is Better With Ubuntu 14.10
  2. RunAbove: A POWER8 Compute Cloud With Offerings Up To 176 Threads
  3. 6-Way Ubuntu 14.10 Linux Desktop Benchmarks
  4. Ubuntu 14.10 XMir System Compositor Benchmarks
Latest Linux News
  1. Dead Island GOTY Now Available On Linux/SteamOS
  2. Ubuntu 14.04 In The Power8 Cloud From RunAbove
  3. KDE With Theoretical Client-Side Decorations, Windows 10 Influence
  4. Sandusky Lee: Great Cabinets For Storing All Your Computer Gear
  5. Fedora 21 Beta & Final Release Slip Further
  6. Mesa 10.3.2 Has A Couple Bug-Fixes
  7. RadeonSI/R600g HyperZ Support Gets Turned Back On
  8. openSUSE Factory & Tumbleweed Are Merging
  9. More Fedora Delays: Fedora 21 Beta Slips
  10. Mono Brings C# To The Unreal Engine 4
Latest Forum Discussions
  1. Updated and Optimized Ubuntu Free Graphics Drivers
  2. HOPE: The Ease Of Python With The Speed Of C++
  3. Use Ubuntu MATE 14.10 Make it an official distro.
  4. Users/Developers Threatening Fork Of Debian GNU/Linux
  5. Debian Is Back To Discussing Init Systems, Freedom of Choice
  6. AMD Radeon VDPAU Video Performance With Gallium3D
  7. Ubuntu 16.04 Might Be The Distribution's Last 32-Bit Release
  8. Linux hacker compares Solaris kernel code: