Linux 5.6 Crypto Getting AVX/AVX2/AVX-512 Optimized Poly1305 - Helps WireGuard
Now that lead WireGuard lead developer Jason Donenfeld has managed to get this secure VPN tunnel technology queued for introduction in Linux 5.6 mainline, he's begun optimizing other areas of the kernel for optimal WireGuard performance.
Poly1305 is used by WireGuard for the message authentication code and that's the latest bit being optimized in mainline to not only benefit WireGuard but other crypto users as well. Donenfeld has provided x86_64 vectorized implementations of Poly1305 for AVX, AVX-2, and AVX-512F. These AVX/AVX2/AVX-512 optimized versions are proving to be clearly faster -- though with AVX-512 is only enabled for Cannonlake/Icelake and newer as for Skylake the AVX-512 down-clocking is causing the performance to come up short.
For the AVX2 implementation when testing on a Core i7 6700HQ on large message sizes meant a difference of dropping from 1052 cycles to 720. For The AVX-512 version also meant dropping from 1058 cycles to 690 while providing significant performance improvements still for all sizes tested.
More details on this AVX/AVX2/AVX-512 optimized Poly1305 implementation can be found via this commit in the crypto development code ahead of the Linux 5.6 merge window.
Poly1305 is used by WireGuard for the message authentication code and that's the latest bit being optimized in mainline to not only benefit WireGuard but other crypto users as well. Donenfeld has provided x86_64 vectorized implementations of Poly1305 for AVX, AVX-2, and AVX-512F. These AVX/AVX2/AVX-512 optimized versions are proving to be clearly faster -- though with AVX-512 is only enabled for Cannonlake/Icelake and newer as for Skylake the AVX-512 down-clocking is causing the performance to come up short.
For the AVX2 implementation when testing on a Core i7 6700HQ on large message sizes meant a difference of dropping from 1052 cycles to 720. For The AVX-512 version also meant dropping from 1058 cycles to 690 while providing significant performance improvements still for all sizes tested.
More details on this AVX/AVX2/AVX-512 optimized Poly1305 implementation can be found via this commit in the crypto development code ahead of the Linux 5.6 merge window.
9 Comments