For those interested in Non-Unified Memory Access performance under Linux, here's an independent performance comparison that puts the mainline kernel against three other NUMA kernels.
The three non-mainline kernel configurations are of the Balance NUMA v10 tree, the Auto NUMA v28 tree, and the Unified NUMA v3 tree. The focus of the tests done by Ingo Molnar are to figure out the NUMA optimization qualities of these three competing kernel implementations.
Tests done by Ingo Molnar look at convergence latency, workload bandwidth, workload spread, and other Linux memory test scenarios.
For those interested in all of the NUMA benchmark results, they are available on the Linux kernel mailing list