1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Memory
  5. Motherboards
  6. Processors
  7. Software
  8. Storage
  9. Operating Systems


Facebook RSS Twitter Twitter Google Plus


Phoronix Test Suite

OpenBenchmarking.org

Improving The Linux Kernel's Memory Performance

Linux Kernel

Published on 16 August 2011 11:32 AM EDT
Written by Michael Larabel in Linux Kernel
33 Comments

Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3.

This vendor hasn't yet published the patches to this "optimized" memcpy that's meant to replace what the developer says is "suboptimal" currently in the Linux kernel, but the patches are being cleaned up and should then be released. Besides a 12.2% boost in the application frame-rate from Atom Z5xx hardware, the C0 residency managed to drop from 75% to 67%, which means lower power consumption too.

Andi Kleen, a long-time contributor to the Linux kernel and Intel employee, had immediately responded to say, "SSE3 in the kernel memcpy would be incredible expensive, it would need a full FPU saving for every call and preemption disabled."

Well known kernel contributor Igno Molnar had responded as well to say that the memcpy() work sounds interesting and that it would be nice to see performance profiles. Igno though was concerned as well with the cost of using the Streaming SIMD Extensions (SSE) for the memory copy function. "The thing is, we obviously want to achieve those gains of 12.2% fps and while we probably do not want to switch the kernel's memcpy to SSE right now (the save/restore costs are significant), we could certainly try to optimize the specific codepath that your video playback path is hitting. If it's some bulk memcpy in a key video driver then we could offer a bulk-optimized x86 memcpy variant which could be called from that driver - and that could use SSE3 as well."

At that point there was another developer, Borislav Petkov of AMD, mentioned he too had been exploring the possibilities of using the SSE extensions for the memory block copy call. Borislav had explored the memcpy buffer sizes while building the Linux kernel and found a large number of small chunks, as expected, but also a number of larger sized memory copies. This led him to the belief that if the buffer copy is large enough, the context save/restore cost of using SSE is worthwhile. He has written a patch to use his SSE memory copy for larger buffers and this has led to a 6% performance improvement in the time required to build the Linux kernel.

The SSE3 instruction set has been supported by Intel going back to 2004 with the Pentium 4 "Prescott" CPUs while AMD adopted Streaming SIMD Extensions 3 with their Athlon 64 "Venice" and "San Diego" CPUs beginning in 2005. If these patches are performing as good as their authors are reporting, there may be some nice memory performance improvements coming to the Linux kernel.

With the Linux 3.1 merge window having passed, the earliest that such work could possibly be incorporated would be the Linux 3.2 kernel.

The kernel mailing list discussion regarding memcpy performance can be found on LKML.org.

About The Author
Michael Larabel is the principal author of Phoronix.com and founded the web-site in 2004 with a focus on enriching the Linux hardware experience and being the largest web-site devoted to Linux hardware reviews, particularly for products relevant to Linux gamers and enthusiasts but also commonly reviewing servers/workstations and embedded Linux devices. Michael has written more than 10,000 articles covering the state of Linux hardware support, Linux performance, graphics hardware drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated testing software. He can be followed via and or contacted via .
Latest Linux Hardware Reviews
  1. 13-Way Low-End GPU Comparison With AMD's AM1 Athlon
  2. ASUS AM1I-A: A Mini-ITX Board For Socketed Kabini APUs
  3. Mini-Box M350: A Simple, Affordable Mini-ITX Case
  4. Overclocking The AMD AM1 Athlon & Sempron APUs
Latest Linux Articles
  1. Ubuntu 12.04.4 vs. 13.10 vs. 14.04 LTS Desktop Benchmarks
  2. AMD OpenCL Performance With AM1 Kabini APUs
  3. A Quick Look At GCC 4.9 vs. LLVM Clang 3.5
  4. Are AMD Athlon/Sempron APUs Fast Enough For Steam On Linux?
Latest Linux News
  1. Borderlands Is Being Considered For Linux
  2. Mesa 10.0 & 10.1 Stable Get Updated
  3. Getting Hit By The Variable Performance Of The Public Cloud
  4. Git 2.0 Test Releases Begin With Many Changes
  5. Wine 1.7.17 Works On Its Task Scheduler, C Run-Time
  6. The Improv ARM Board Still Isn't Shipping; Riding A Dead Horse?
  7. Debian To Maintain 6.0 Squeeze As An LTS Release
  8. Wasteland 2 Is Finally Released For Linux Gamers
  9. FreeBSD Advances For ARM, Bhyve, Clang
  10. Ubuntu 14.04 LTS "Trusty Tahr" Officially Released
  11. Ubuntu 12.04 LTS vs. 14.04 LTS Server Benchmarks
  12. QEMU 2.0 Released With ARM, x86 Enhancements
Latest Forum Discussions
  1. Suggestions about how to make a Radeon HD 7790 work decently?
  2. The GNOME Foundation Is Running Short On Money
  3. Updated and Optimized Ubuntu Free Graphics Drivers
  4. Radeon 8000M problematic on Linux?
  5. Linux Kernel Developers Fed Up With Ridiculous Bugs In Systemd
  6. After Jack Keane, RuseSoft will briing Ankh 3 to Linux through Desura
  7. Suspected PHP Proxy Issue
  8. Change installation destination from home directory