AV1 Decoder dav1d Lands 10-bit AVX2 Assembly For Big Speed-Up, Thanks Facebook + Netflix
For those making use of 10-bit AV1 content and using dav1d as the performant CPU-based decoder, the performance on modern Intel and AMD processors is about to be a heck of a lot better.
Dav1d has enjoyed speedy 10-bit decoding on AArch64 hardware thanks to hand-written Assembly while finally dav1d is seeing AVX2-optimized 10-bit decode. Both Facebook and Netflix provided the funding to make the AVX2-optimized 10-bit decode happen for dav1d.
As of today that Assembly code optimized for AVX2 at high bit depths was merged.
I'm told the performance is a "big" improvement. The dav1d developers are also planning on issuing their next feature release with this code included in the next week or so.
When that next release happens, I'll surely be running benchmarks. For now you can find current CPU performance figures with different AV1 inputs for dav1d via this OpenBenchmarking.org composite listing.