1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems


Facebook RSS Twitter Twitter Google Plus


Phoronix Test Suite

OpenBenchmarking.org

AMD Drops Steamroller "bdver3" Compiler Support

AMD

Published on 11 October 2012 10:56 AM EDT
Written by Michael Larabel in AMD
8 Comments

Second-generation Bulldozer processors only started appearing recently in the form of the Trinity APUs with Piledriver cores. The next Bulldozer-2 wave will come when AMD releases their Piledriver-bearing "Vishera" FX-Series desktop processors. While this hardware has yet to publicly arrive, AMD is are already working on compiler support for their third-generation Bulldozer -- a.k.a. "Steamroller" -- micro-architecture.

As first spotted via my Anzwix system, Ganesh Gopalasubramanian of AMD published the first patch work concerning "bdver3" enablement to gcc-patches on Thursday morning. "The attached patch (Patch.txt) enables the next version of AMD's bulldozer core. A new file (bdver3.md) is also attached which describes the pipelines."

As with previous generations, this compiler tuning/optimization support for next year's AMD hardware is being called "bdver3" in a similar naming convention to the original Bulldozer CPUs ("bdver1") and the new Piledriver / Bulldozer 2 support ("bdver3"). The bdver3 GCC patch in its early form copies most of the tuning work from bdver2 but the pipelines have already been modelled after the new Steamroller core.

The bdver2 support within this leading open-source code compiler super-set the following instruction set extensions: BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, and ABM. The current bdver3 family support is also setting these same instruction set extensions without anything new for the moment, aside from the pipeline remodelling.

Here's some code comments from within the patch that explain a little about how the Bulldozer v3 scheduling is done:
The bdver3 contains three pipelined FP units and two integer units. Fetching and decoding logic is different from previous fam15 processors. Fetching is done every two cycles rather than every cycle and two decode units are available. The decode units therefore decode four instructions in two cycles.

Three DirectPath instructions decoders and only one VectorPath decoder is available. They can decode three DirectPath instructions or one VectorPath instruction per cycle.

The load/store queue unit is not attached to the schedulers but communicates with all the execution units separately instead.

bdver3 belong to fam15 processors. We use the same insn attribute that was used for bdver3 decoding scheme.
The only other potentially interesting comments out of this early Steamroller compiler patch is "New AMD processors never drop prefetches; if they cannot be performed immediately, they are queued. We set number of simultaneous prefetches to a large constant to reflect this (it probably is not a good idea not to limit number of prefetches at all, as their execution also takes some time)." Additionally, "BDVER3 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop."

The way that the new compiler code is determining a "bdver3" processor rather than a previous-generation Bulldozer is based upon the AMD APU/CPU having xsaveopt. The xsaveopt instruction is part of AVX (Advanced Vector Extensions) and is an optimized extended state save instruction similar to xsave. With bdver3 is apparently the first time this xsaveopt instruction is being supported by AMD processors.

Gopalasubramanian is trying to push the initial bdver3 work already upstream, which means the first AMD Steamroller support would make it into the GCC 4.8 release. GCC 4.8 should be released in H1'2013 and hopefully by then we will see more mature bdver3 compiler optimization support. Patches for LLVM/Clang for bdver3 will also likely soon emerge.

Last week was when the binutils support for bdver3 appeared on that project's list. "This patch adds basic support for AMD's bdver3 core. I have tested on x86_64-unknown-linux-gnu and noted no regressions. Please review it and let me know if it is OK to commit to trunk."

The next-generation Bulldozer is expected for release in 2013. Steamroller-based products are said to be manufactured on a 28nm process and will focus upon greater parallelism compared to current AMD products. Hopefully it will not be too far out until we begin seeing performance benchmarks of AMD Steamroller engineering samples on OpenBenchmarking.org via the Phoronix Test Suite automated benchmarking software. With our open-source benchmarking platform, we happened to spot results almost one year early with Trinity, several months early with Interlagos, and other next-generation Intel/AMD/VIA CPUs from those early test engineers that are overly-excited to publicly share their data or are just too careless for uploading the data to a public window.

Intel's 2013 micro-architecture, Haswell, has already been seeing compiler tuning and optimization work going back many months with the GCC compiler stack. The Haswell support is now in great shape for the compiler as well as the open-source Linux drivers.

For those wondering about the performance impact of generating binaries specifically for "bdver2" rather than generic x86_64, I do have benchmarks that I recently did from the A10-5800K Trinity that look at the compiler tuning and optimization impact. There will also be AMD Vishera Linux benchmarks and other analysis whenever those processors surface. The A10-5800K bdver2 compiler tuning results will be published on Phoronix in the coming days. Also forthcoming is a bdver2 compiler comparison between various releases of GCC, LLVM/Clang, and AMD's Open64 compiler.

Latest Linux Hardware Reviews
  1. Mini-Box M350: A Simple, Affordable Mini-ITX Case
  2. Overclocking The AMD AM1 Athlon & Sempron APUs
  3. AMD Athlon 5350 / 5150 & Sempron 3850 / 2650
  4. Upgraded Kernel & Mesa Yield A Big Boost For Athlon R3 Graphics
Latest Linux Articles
  1. Ubuntu 12.04.4 vs. 13.10 vs. 14.04 LTS Desktop Benchmarks
  2. AMD OpenCL Performance With AM1 Kabini APUs
  3. A Quick Look At GCC 4.9 vs. LLVM Clang 3.5
  4. Are AMD Athlon/Sempron APUs Fast Enough For Steam On Linux?
Latest Linux News
  1. Ubuntu 14.04 LTS "Trusty Tahr" Officially Released
  2. Ubuntu 12.04 LTS vs. 14.04 LTS Server Benchmarks
  3. QEMU 2.0 Released With ARM, x86 Enhancements
  4. Running The Unity 8 Preview Session On Ubuntu 14.04 LTS
  5. R600 Gallium3D Disables LLVM Back-End By Default
  6. Fedora 21 Gets GNOME 3.12, PHP 5.6, Mono 3.4
  7. Fedora Workstation Is Making Me Quite Excited
  8. Maynard: A Lightweight Wayland Desktop
  9. Chromium Browser Going Through Growing Pains In Ubuntu 14.04
  10. KDE 4.13 Is Being Released Today With New Features
  11. Trying Out Radeon R9 290 Graphics On Open-Source
  12. Intel Broadwell GT3 Graphics Have Dual BSD Rings
Latest Forum Discussions
  1. Updated and Optimized Ubuntu Free Graphics Drivers
  2. The GNOME Foundation Is Running Short On Money
  3. After Jack Keane, RuseSoft will briing Ankh 3 to Linux through Desura
  4. Suspected PHP Proxy Issue
  5. Linux Kernel Developers Fed Up With Ridiculous Bugs In Systemd
  6. Change installation destination from home directory
  7. Bye bye BSD, Hello Linux: A Sys Admin's Story
  8. New tool for undervolt/overclock AMD K8L and K10 processors