AVX2 Tuning Paying Off Big Time For Dav1d 10b/12b Video Decode

Following this weekend's release of dav1d 0.9, I immediately set off to do some benchmarking of this updated AV1 CPU-based video decoder used by Google Chrome, Mozilla Firefox, and other software for processor-based AV1 decoding with all but the very latest hardware not offering GPU-accelerated AV1 handling yet.
The dav1d 0.9 performance is largely unchanged on x86_64 until getting to the high bit depth content...
For 10-bit AV1 videos, dav1d is running multiple times faster thanks to this AVX2 Assembly that was hand written and funded by Netflix and Facebook. Big win with dav1d 0.9 here... As a reminder, AVX2 is found on Intel CPUs back to Haswell and on AMD processors since Excavator.
More dav1d 0.9 benchmarks incoming at OpenBenchmarking.org.
38 Comments