Boosting The Performance-Per-Watt Of GPUs

Posted by Michael Larabel on October 11, 2012

Some prominent open-source graphics driver developers have published a paper for last week's USENIX HotPower 2012 that covered the topic of Power and Performance Analysis of GPU-Accelerated Systems.

Yuki Abe, Hiroshi Sasaki, Martin Peres, Koji Inoue, Kazuaki Murakami, and Shinpei Kato wrote the paper and did the work covering the power and performance of GPU-based systems. The two names in particular you should recognize are Martin Peres and Shinpei Kato.

Martin Peres is the one that has been working on power management and re-clocking support for the open-source Nouveau driver. He's also talked about open-source graphics security and other topics.

Shinpei Kato meanwhile has done work on PathScale's open-source NVIDIA compute driver, proposed GPU command scheduling, and most recently has been working on Gdev that is an open-source NVIDIA CUDA run-time.

Here's their abstract on the USENIX HotPower paper:
Graphics processing units (GPUs) provide significant improvements in performance and performance-per-watt as compared to traditional multicore CPUs. This energy-efficiency of GPUs has facilitated use of GPUs in many application domains. Albeit energy efficient, GPUs still consume non-trivial power independently of CPUs. It is desired to analyze the power and performance charateristic of GPUs and their causal relation with CPUs. In this paper, we provide a power and performance analysis of GPU-accelerated systems for better understandings of these implications. Our analysis discloses that system energy could be reduced by about 28% retaining a decrease in performance within 1%. Specifically, we identify that energy saving is particularly significant when (i) reducing the GPU memory clock for compute- intensive workload and (ii) reducing the GPU core clock for memory-intensive workload. We also demonstrate that voltage and frequency scaling of CPUs is trivial and even should not be applied in GPU-accelerated systems. We believe that these findings are useful to develop dynamic voltage and frequency scaling (DVFS) algorithms for GPU-accelerated systems.
And their conclusion:
We have presented a power and performance analysis of GPU-accelerated systems based on the NVIDIA Fermi architecture. Our findings include that the CPU is a weak factor for energy savings of GPU-accelerated systems unless power gating is supported by the GPU. In contrast, voltage and frequency scaling of the GPU is significant to reduce energy consumption. Our experimental results demonstrated that system energy could be reduced by about 28% retaining a decrease in performance within 1%, if the performance level of the GPU can be scaled effectively.

In future work, we plan to develop DVFS algorithms for GPU-accelerated systems, using the characteristic identified in this paper. We basically consider such an approach that controls the GPU core clock for memory-intensive workload while controls the GPU memory clock for compute-intensive workload. To this end, we integrate PTX code analysis into DVFS algorithms so that energy optimization can be provided at runtime. We also consider a further dynamic scheme that scales the performance level of the GPU during the execution of GPU code, whereas we restricted a scaling point at the boundary of GPU code in this paper.
A PDF of their GPU power/performance paper that was published last week can be found at USENIX.org.

Discuss this article in our forums, IRC channel, or email the author. You can also follow our content via RSS and on social networks like Facebook, Identi.ca, and Twitter (@Phoronix and @MichaelLarabel). Subscribe to Phoronix Premium to view our content without advertisements, view entire articles on a single page, and experience other benefits.
Latest Hardware Reviews
  1. Sumo Lounge Emperor
  2. Gallium3D Continues Improving OpenGL For Older Radeon GPUs
  3. 15-Way Open vs. Closed Source NVIDIA/AMD Linux GPU Comparison
  4. Nouveau vs. NVIDIA Linux Comparison Shows Shortcomings
Latest Software Articles
  1. GCC 4.8.0 vs. LLVM Clang 3.3 Compiler Performance
  2. Intel Linux OpenGL Driver Leading Over Apple OS X
  3. The Cost Of Ubuntu Disk Encryption
  4. Btrfs vs. EXT4 vs. XFS vs. F2FS On Linux 3.10
Latest Linux News
  1. A New X.Org-Free Wayland LiveCD Released
  2. Unity 8, Mir Made Progress This Week On Features
  3. LLVM Clang 3.3 RC2 Is Ready For Testing
  4. AMD RadeonSI Gallium3D Begins Simple CL Demos
  5. Intel Shows Off GNOME3-Based Tizen Shell
  6. Linux Desktop Security Could Be A Whole Lot Better
  7. KDE 4.11 Will Be The Last Major KDE4 Workspaces Feature Release
  8. New NVIDIA Linux Driver Supports The GeForce GTX 780
  9. Chrome 28 To Offer More Speed Improvements
  10. Digia Announces "Boot To Qt" Project
  11. X.Org Libraries Hit By Round Of Security Issues
Latest Forum Talk
  1. Unity 8, Mir Made Progress This Week On Features
  2. Linux's "Ondemand" Governor Is No...
  3. AMD RadeonSI Gallium3D Begins Simple CL Demos
  4. A New X.Org-Free Wayland LiveCD Released
  5. GCC 4.8.0 vs. LLVM Clang 3.3 Compiler Performance
  6. Linux Desktop Security Could Be A Whole Lot Better
  1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Motherboards
  5. Peripherals
  6. Processors
  7. Software
  8. Operating Systems
  9. All Articles
  1. Linux Benchmarking
  2. OpenBenchmarking.org
  3. Phoronix Test Suite