Coming out recently have been a set of more than two dozen patches for the Linux AIO support that provides notable performance improvements. From the patch-set's author on the AIO mailing list, "The results in my testing are pretty impressive, particularly when an ioctx is being shared between multiple threads. In my crappy synthetic benchmark, with 4 threads submitting and one thread reaping completions, I saw overhead in the aio code go from ~50% (mostly ioctx lock contention) to low single digits. Performance with ioctx per thread improved too, but I'd have to rerun those benchmarks. The reason I've been focused on performance when the ioctx is shared is that for a fair number of real world completions, userspace needs the completions aggregated somehow - in practice people just end up implementing this aggregation in userspace today, but if it's done right we can do it much more efficiently in the kernel."
Yesterday there was a second version of these kernel AIO patches for enhancing the performance. It's possible we will see the a-synchronous I/O performance improvements merged into the Linux 3.8 kernel later this month if further revisions aren't deemed necessary that would stave off the merging to Linux 3.9. The AIO changes affect around one thousand lines of kernel code.