Netflix Optimized FreeBSD's Network Stack More Than Doubled AMD EPYC Performance
Netflix has long been known to be using FreeBSD in their data centers particularly where network performance is concerned. But in wanting to deliver 200Gb/s throughput from individual servers led them to making NUMA optimizations to the FreeBSD network stack. Allocating NUMA local memory for kernel TLS crypto buffers and for backing files sent via sentfile were among their optimizations. Changes to network connection handling and dealing with incoming connections to Nginx were also made.
For those just wanting the end result, Netflix's NUMA optimizations to FreeBSD resulted in their Intel Xeon servers going from 105Gb/s to 191Gb/s while the NUMA fabric utilization dropped from 40% to 13%.
The AMD EPYC performance is even more impressive in going from 68GB/s to 194GB/s. So while EPYC started out much slower than Xeon, the Netflix AMD EPYC servers are now closer than Intel for achieving 200Gb/s performance.
Not only is EPYC faster, but thanks to the 128 PCIe lanes per socket they are able to get by in one socket what they are using two Intel Xeon CPUs for otherwise. One area that AMD was critiqued for is the inability for Netflix to monitor the Infinity Fabric saturation as "AMD's tools are lacking (even on Linux)."
In the end they are now effectively at 200Gb/s encrypted video streaming from FreeBSD per server. More details via this interesting slide deck.