Siarhei Siamashka wrote today on GitHub about this new xf86-video-sunxifb driver he's been developing. The Allwinner A10/A13 SoCs bear ARM Mali 400 graphics and there is the xf86-video-mali DDX that's open-source and then the closed-source Mali library for OpenGL ES / 3D support.
Siamashka's work isn't about any 3D driver writing or anything to compete with the reverse-engineered Lima graphics driver project for ARM's Mali hardware. The xf86-video-mali driver that ARM puts out is "provided as a basis for creating your own X display driver", at least in theory, so he's decided to extend the Allwinner A10/A13 coverage with his xf86-video-sunxifb fork.
The results shown on his blog seem fairly positive and can beat the upstream Mali DDX, but for his 2D acceleration he largely based it around the generic xf86-video-fbdev driver. So he's actually not going too much about using GPU acceleration itself but rather letting the Pixman library generate ARM NEON-optimized code for doing the grunt of the 2D work. From his Git repository, it appears most of what he has done was adding support to the driver for using the display controller hardware overlays for fully visible windows to avoid extra memory copies and support for the hardware cursor atop this fbdev driver fork.
He then goes on to explain in his posting:
So then what is wrong with the xf86-video-mali? It suffers from the same problem as many other X11 drivers for ARM hardware. DRI2 extension (the thing which is used for the integration of GLES acceleration) needs some hardware-specific buffers allocation (UMP in the case of xf86-video-mali). And EXA framework (a convenience layer for adding 2D acceleration hooks) supports overriding pixmap buffers allocation as part of its functionality. So the guys apparently decided that it's a good idea to override the allocation of absolutely all pixmaps without exception and not just the ones needed for DRI2. This was a total 2D performance disaster for the SGX PVR driver driver. It is also killing performance for xf86-video-mali. But because xf86-video-mali is an open source driver, I could run one more test. UMP also allows allocation of cached buffers, so with a small tweak xf86-video-mali can be changed to do cached allocations instead (let's just ignore the potential cache coherency issues for the buffers shared with Mali hadrware via DRI2). The benchmark results for cached UMP allocations are shown as green bars on the chart above. In some cases (t-firefox-fishtank), the performance for cached UMP allocations managed to catch up with xf86-video-sunxifb (and naturally xf86-video-fbdev). But many other cases are still slow. It's not just uncached memory killing performance, the UMP allocations themselves also require expensive ioctls and have very heavy overhead.In terms of what's next for his xf86-video-sunxifb driver, he hope to add support for X-Video and RandR (the X Resize and Rotate extension). The driver code can be found on GitHub.