Faster Raspberry Pi X.Org Desktop Performance With NEON

NEON, of course, being ARM's Advanced SIMD extension. With making use of NEON, the VC4 driver is seeing faster X performance.
Eric commented on some of the performance impact:
"It seems to be intended for stack loads/stores, but we can also use it to get 64 bytes of data in from memory untouched into NEON registers, and then I can use 4 (32bpp) or 8 (8 or 16bpp) VST1s to store it to the CPU side. With this, we get a 208.256% +/- 7.07029% (n=10) improvement to GetTexImage performance at 1024x1024. Doing the same NEON code for stores gave a 41.2371% +/- 3.52799% (n=10) improvement, probably mostly due to not calling into memcpy and having it go through its size/alignment-based memcpy path choosing process.
I'm not yet hitting full memory bandwidth, but this should be a noticeable improvement to X, and it'll probably help my piglit test suite runtime as well."
Those interested in learning more can read this week's VC4 status update where he talks about the latest work on improving the speed of this graphics driver for the Raspberry Pi.
8 Comments