Dav1d 0.9.2 Released With More SSSE3, SSE4, AVX2, NEON Optimizations

Written by Michael Larabel in Multimedia on 5 September 2021 at 02:00 AM EDT. 18 Comments

Released at the start of August was dav1d 0.9.1 for this high performance CPU-based AV1 open-source video decoder while now another point release is available with yet more optimizations.

Dav1d has been the fastest AV1 decoder I've been aware of for some time but with each release it keeps managing to squeeze out more performance. With Dav1d 0.9.2 out now are yet more Intel/AMD and Arm CPU optimizations. The particular highlights for v0.9.2 amount to:

- x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
- x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
- x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
- ARM NEON optimizations for FilmGrain Gen_grain functions
- Optimizations for splat_mv in SSE2/AVX2 and NEON
- x86: SGR improvements for SSSE3 CPUs
- x86: AVX2 optimizations for cfl_ac

I've been running some benchmarks this weekend and generally with modern, higher-end hardware there are some slight gains out of dav1d 0.9.2:

The 10-bit decode performance seemed to benefit the most compared to the prior release when testing locally v0.9.1 vs v0.9.2 fresh on a few different systems.

For other decode tests, there were some slight gains in some of the configurations.

Dav1d 0.9.2 is available for download from VideoLAN.org. There are also more dav1d benchmarks on OpenBenchmarking.org with trying out the new release on an assortment of other local systems.

18 Comments