Dillon has been making a number of changes related to DragonFlyBSD's memory handling as well as locking within the kernel. These changes are helping DragonFly's performance on multi-socket systems. He shared, "As an example of what we get from this, the dual Xeon system was topping out at 1.5-2M zero-fill page faults a second running a test program on all 32 threads before the changes. After the changes (and all the other work), the same system is now pushing 5.6 MILLION zero-fill page faults/sec across 32 threads. Our four-socket opteron system also saw major improvements and can now achieve something like 4.7M zero-fill page faults a second across the 48-cores."
The NUMA-awareness for DragonFlyBSD is also further increasing performance and reducing memory stalls as well. If you want to learn more about these changes going into DragonFlyBSD, read this mailing list post for all the details as well as some more performance numbers.