Faster Raspberry Pi X.Org Desktop Performance With NEON

Written by Michael Larabel in X.Org on 16 January 2017 at 06:49 AM EST. 8 Comments
Broadcom developer Eric Anholt has begun writing code within the VC4 open-source driver stack to make use of NEON in its acceleration code-paths.

NEON, of course, being ARM's Advanced SIMD extension. With making use of NEON, the VC4 driver is seeing faster X performance.

Eric commented on some of the performance impact:
"It seems to be intended for stack loads/stores, but we can also use it to get 64 bytes of data in from memory untouched into NEON registers, and then I can use 4 (32bpp) or 8 (8 or 16bpp) VST1s to store it to the CPU side. With this, we get a 208.256% +/- 7.07029% (n=10) improvement to GetTexImage performance at 1024x1024. Doing the same NEON code for stores gave a 41.2371% +/- 3.52799% (n=10) improvement, probably mostly due to not calling into memcpy and having it go through its size/alignment-based memcpy path choosing process.

I'm not yet hitting full memory bandwidth, but this should be a noticeable improvement to X, and it'll probably help my piglit test suite runtime as well."

Those interested in learning more can read this week's VC4 status update where he talks about the latest work on improving the speed of this graphics driver for the Raspberry Pi.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week