1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems

Facebook RSS Twitter Twitter Google Plus

Phoronix Test Suite

OpenBenchmarking Benchmarking Platform
Phoromatic Test Orchestration

AMD Drops Steamroller "bdver3" Compiler Support


Published on 11 October 2012 10:56 AM EDT
Written by Michael Larabel in AMD
Add A Comment

Second-generation Bulldozer processors only started appearing recently in the form of the Trinity APUs with Piledriver cores. The next Bulldozer-2 wave will come when AMD releases their Piledriver-bearing "Vishera" FX-Series desktop processors. While this hardware has yet to publicly arrive, AMD is are already working on compiler support for their third-generation Bulldozer -- a.k.a. "Steamroller" -- micro-architecture.

As first spotted via my Anzwix system, Ganesh Gopalasubramanian of AMD published the first patch work concerning "bdver3" enablement to gcc-patches on Thursday morning. "The attached patch (Patch.txt) enables the next version of AMD's bulldozer core. A new file (bdver3.md) is also attached which describes the pipelines."

As with previous generations, this compiler tuning/optimization support for next year's AMD hardware is being called "bdver3" in a similar naming convention to the original Bulldozer CPUs ("bdver1") and the new Piledriver / Bulldozer 2 support ("bdver3"). The bdver3 GCC patch in its early form copies most of the tuning work from bdver2 but the pipelines have already been modelled after the new Steamroller core.

The bdver2 support within this leading open-source code compiler super-set the following instruction set extensions: BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, and ABM. The current bdver3 family support is also setting these same instruction set extensions without anything new for the moment, aside from the pipeline remodelling.

Here's some code comments from within the patch that explain a little about how the Bulldozer v3 scheduling is done:
The bdver3 contains three pipelined FP units and two integer units. Fetching and decoding logic is different from previous fam15 processors. Fetching is done every two cycles rather than every cycle and two decode units are available. The decode units therefore decode four instructions in two cycles.

Three DirectPath instructions decoders and only one VectorPath decoder is available. They can decode three DirectPath instructions or one VectorPath instruction per cycle.

The load/store queue unit is not attached to the schedulers but communicates with all the execution units separately instead.

bdver3 belong to fam15 processors. We use the same insn attribute that was used for bdver3 decoding scheme.
The only other potentially interesting comments out of this early Steamroller compiler patch is "New AMD processors never drop prefetches; if they cannot be performed immediately, they are queued. We set number of simultaneous prefetches to a large constant to reflect this (it probably is not a good idea not to limit number of prefetches at all, as their execution also takes some time)." Additionally, "BDVER3 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop."

The way that the new compiler code is determining a "bdver3" processor rather than a previous-generation Bulldozer is based upon the AMD APU/CPU having xsaveopt. The xsaveopt instruction is part of AVX (Advanced Vector Extensions) and is an optimized extended state save instruction similar to xsave. With bdver3 is apparently the first time this xsaveopt instruction is being supported by AMD processors.

Gopalasubramanian is trying to push the initial bdver3 work already upstream, which means the first AMD Steamroller support would make it into the GCC 4.8 release. GCC 4.8 should be released in H1'2013 and hopefully by then we will see more mature bdver3 compiler optimization support. Patches for LLVM/Clang for bdver3 will also likely soon emerge.

Last week was when the binutils support for bdver3 appeared on that project's list. "This patch adds basic support for AMD's bdver3 core. I have tested on x86_64-unknown-linux-gnu and noted no regressions. Please review it and let me know if it is OK to commit to trunk."

The next-generation Bulldozer is expected for release in 2013. Steamroller-based products are said to be manufactured on a 28nm process and will focus upon greater parallelism compared to current AMD products. Hopefully it will not be too far out until we begin seeing performance benchmarks of AMD Steamroller engineering samples on OpenBenchmarking.org via the Phoronix Test Suite automated benchmarking software. With our open-source benchmarking platform, we happened to spot results almost one year early with Trinity, several months early with Interlagos, and other next-generation Intel/AMD/VIA CPUs from those early test engineers that are overly-excited to publicly share their data or are just too careless for uploading the data to a public window.

Intel's 2013 micro-architecture, Haswell, has already been seeing compiler tuning and optimization work going back many months with the GCC compiler stack. The Haswell support is now in great shape for the compiler as well as the open-source Linux drivers.

For those wondering about the performance impact of generating binaries specifically for "bdver2" rather than generic x86_64, I do have benchmarks that I recently did from the A10-5800K Trinity that look at the compiler tuning and optimization impact. There will also be AMD Vishera Linux benchmarks and other analysis whenever those processors surface. The A10-5800K bdver2 compiler tuning results will be published on Phoronix in the coming days. Also forthcoming is a bdver2 compiler comparison between various releases of GCC, LLVM/Clang, and AMD's Open64 compiler.

About The Author
Michael Larabel is the principal author of Phoronix.com and founded the web-site in 2004 with a focus on enriching the Linux hardware experience and being the largest web-site devoted to Linux hardware reviews, particularly for products relevant to Linux gamers and enthusiasts but also commonly reviewing servers/workstations and embedded Linux devices. Michael has written more than 10,000 articles covering the state of Linux hardware support, Linux performance, graphics hardware drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated testing software. He can be followed via and or contacted via .
Latest Linux News
  1. Libdrm 2.4.62 Is An Important Update For Open-Source GPU Drivers
  2. The State of Unity 3D Game Engine, Editor On Linux
  3. ZFS On Linux Brings Linux 4.1 Support, Fixes
  4. Old Net Burst Tests, Ubuntu Phone & Assembly x86 Were Popular Topics Last Month
  5. Qt 5.5 Officially Released
  6. Global Shortcuts In KDE Plasma Under Wayland
  7. LLVMpipe FP64 Support Knocks Off Some GL4 Extensions
  8. Dell Gets An Airplane Mode Switch Driver In Linux 4.2
  9. I Gave Up Waiting On The Water-Cooled Radeon R9 Fury X
  10. NVIDIA Tegra X1 Chromebooks Appear Closer, Support Added To Coreboot
Latest Articles & Reviews
  1. How KDE VDG Is Trying To Make Open-Source Software Beautiful
  2. Attempting To Try Out BCache On The Linux 4.1 Kernel
  3. CompuLab's Fitlet Is A Very Tiny, Fanless, Linux PC With AMD A10 Micro
  4. AMD A10-7870K Godavari: RadeonSI Gallium3D vs. Catalyst Linux Drivers
Most Viewed News This Week
  1. Kubuntu 15.10 Could Be The End Of The Road
  2. NVIDIA Starts Supplying Open-Source Hardware Reference Headers
  3. KDBUS Won't Be Pushed Until The Linux 4.3 Kernel
  4. The Staging Pull For Linux 4.2: "Big, Really Big"
  5. The State & Complications Of Porting The Unity Editor To Linux
  6. SteamOS "Brewmaster" Is Valve's New Debian 8.1 Based Version
  7. Jonathan Riddell Steps Down From The Kubuntu Council
  8. ARM Posts Pictures Of AMD's New Development Board