1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems

Facebook RSS Twitter Twitter Google Plus

Phoronix Test Suite

OpenBenchmarking Benchmarking Platform
Phoromatic Test Orchestration

AMD Drops Steamroller "bdver3" Compiler Support


Published on 11 October 2012 10:56 AM EDT
Written by Michael Larabel in AMD

Second-generation Bulldozer processors only started appearing recently in the form of the Trinity APUs with Piledriver cores. The next Bulldozer-2 wave will come when AMD releases their Piledriver-bearing "Vishera" FX-Series desktop processors. While this hardware has yet to publicly arrive, AMD is are already working on compiler support for their third-generation Bulldozer -- a.k.a. "Steamroller" -- micro-architecture.

As first spotted via my Anzwix system, Ganesh Gopalasubramanian of AMD published the first patch work concerning "bdver3" enablement to gcc-patches on Thursday morning. "The attached patch (Patch.txt) enables the next version of AMD's bulldozer core. A new file (bdver3.md) is also attached which describes the pipelines."

As with previous generations, this compiler tuning/optimization support for next year's AMD hardware is being called "bdver3" in a similar naming convention to the original Bulldozer CPUs ("bdver1") and the new Piledriver / Bulldozer 2 support ("bdver3"). The bdver3 GCC patch in its early form copies most of the tuning work from bdver2 but the pipelines have already been modelled after the new Steamroller core.

The bdver2 support within this leading open-source code compiler super-set the following instruction set extensions: BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, and ABM. The current bdver3 family support is also setting these same instruction set extensions without anything new for the moment, aside from the pipeline remodelling.

Here's some code comments from within the patch that explain a little about how the Bulldozer v3 scheduling is done:
The bdver3 contains three pipelined FP units and two integer units. Fetching and decoding logic is different from previous fam15 processors. Fetching is done every two cycles rather than every cycle and two decode units are available. The decode units therefore decode four instructions in two cycles.

Three DirectPath instructions decoders and only one VectorPath decoder is available. They can decode three DirectPath instructions or one VectorPath instruction per cycle.

The load/store queue unit is not attached to the schedulers but communicates with all the execution units separately instead.

bdver3 belong to fam15 processors. We use the same insn attribute that was used for bdver3 decoding scheme.
The only other potentially interesting comments out of this early Steamroller compiler patch is "New AMD processors never drop prefetches; if they cannot be performed immediately, they are queued. We set number of simultaneous prefetches to a large constant to reflect this (it probably is not a good idea not to limit number of prefetches at all, as their execution also takes some time)." Additionally, "BDVER3 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop."

The way that the new compiler code is determining a "bdver3" processor rather than a previous-generation Bulldozer is based upon the AMD APU/CPU having xsaveopt. The xsaveopt instruction is part of AVX (Advanced Vector Extensions) and is an optimized extended state save instruction similar to xsave. With bdver3 is apparently the first time this xsaveopt instruction is being supported by AMD processors.

Gopalasubramanian is trying to push the initial bdver3 work already upstream, which means the first AMD Steamroller support would make it into the GCC 4.8 release. GCC 4.8 should be released in H1'2013 and hopefully by then we will see more mature bdver3 compiler optimization support. Patches for LLVM/Clang for bdver3 will also likely soon emerge.

Last week was when the binutils support for bdver3 appeared on that project's list. "This patch adds basic support for AMD's bdver3 core. I have tested on x86_64-unknown-linux-gnu and noted no regressions. Please review it and let me know if it is OK to commit to trunk."

The next-generation Bulldozer is expected for release in 2013. Steamroller-based products are said to be manufactured on a 28nm process and will focus upon greater parallelism compared to current AMD products. Hopefully it will not be too far out until we begin seeing performance benchmarks of AMD Steamroller engineering samples on OpenBenchmarking.org via the Phoronix Test Suite automated benchmarking software. With our open-source benchmarking platform, we happened to spot results almost one year early with Trinity, several months early with Interlagos, and other next-generation Intel/AMD/VIA CPUs from those early test engineers that are overly-excited to publicly share their data or are just too careless for uploading the data to a public window.

Intel's 2013 micro-architecture, Haswell, has already been seeing compiler tuning and optimization work going back many months with the GCC compiler stack. The Haswell support is now in great shape for the compiler as well as the open-source Linux drivers.

For those wondering about the performance impact of generating binaries specifically for "bdver2" rather than generic x86_64, I do have benchmarks that I recently did from the A10-5800K Trinity that look at the compiler tuning and optimization impact. There will also be AMD Vishera Linux benchmarks and other analysis whenever those processors surface. The A10-5800K bdver2 compiler tuning results will be published on Phoronix in the coming days. Also forthcoming is a bdver2 compiler comparison between various releases of GCC, LLVM/Clang, and AMD's Open64 compiler.

About The Author
Michael Larabel is the principal author of Phoronix.com and founded the web-site in 2004 with a focus on enriching the Linux hardware experience and being the largest web-site devoted to Linux hardware reviews, particularly for products relevant to Linux gamers and enthusiasts but also commonly reviewing servers/workstations and embedded Linux devices. Michael has written more than 10,000 articles covering the state of Linux hardware support, Linux performance, graphics hardware drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated testing software. He can be followed via and or contacted via .
Latest Linux News
  1. Linux 4.1-rc5 Kernel Released
  2. Mesa 10.5.6 Brings Fixes All Over The Place
  3. NVIDIA's Proprietary Driver Is Moving Closer With Kernel Mode-Setting
  4. The Latest Linux Kernel Git Code Fixes The EXT4 RAID0 Corruption Problem
  5. Features Added To Mesa 10.6 For Open-Source GPU Drivers
  6. Ubuntu's LXD vs. KVM For The Linux Cloud
  7. Fedora Server 22 Benchmarks With XFS & The Linux 4.0 Kernel
  8. GCC 6 Gets Support For The IBM z13 Mainframe Server
  9. Fedora 22 Is Being Released Next Tuesday
  10. OpenWRT 15.05 Preparing Improved Security & Better Networking
Latest Articles & Reviews
  1. The Latest Features For Linux Performance Management + Benchmark Monitoring
  2. Noctua NH-U12DX i4 + NF-F12
  3. Btrfs RAID 0/1 Benchmarks On The Linux 4.1 Kernel
  4. The State Of Various Firefox Features
Most Viewed News This Week
  1. The Linux 4.0 Kernel Currently Has An EXT4 Corruption Issue
  2. The Linux 4.0 EXT4 RAID Corruption Bug Has Been Uncovered
  3. Microsoft Open-Sources The Windows Communication Foundation
  4. Systemd 220 Has Finally Been Released
  5. Another HTTPS Vulnerability Rattles The Internet
  6. LibreOffice 5.0 Open-Source Office Suite Has Been Branched
  7. LibreOffice 5.0 Beta 1 Released
  8. Will Ubuntu Linux Hit 200 Million Users This Year?