Dav1d 0.5 Released With AVX2, SSSE3 & ARM64 Performance Improvements - Benchmarks
Friday marked the release of dav1d 0.5 as the newest version of this speedy open-source AV1 video decoder. With dav1d 0.5 are optimizations to help out SSSE3 most prominently but also AVX2 and ARM64 processors. Here are some initial benchmarks so far of this new dav1d video decoder on Linux.
The SSSE3 code path for dav1d is now upwards of 40% faster with the v0.5 release. There is also single digit improvements for the AVX2 code path and up to 10% performance improvements for 64-bit ARM. There are also VSX, SSE2, and SSE4 optimizations among the work in this latest release as well as some decoder fixes. Dav1d 0.5 can be found at VideoLAN.org.
Dav1d 0.5 was already updated on the Phoronix Test Suite / OpenBenchmarking.org so over the past day I began firing off some benchmarks on different systems. Of course, more tests to come in future articles but I ran some quick benchmarks for now with the old/new test profile for comparing the performance on the same systems.
Dav1d with the Chimera 1080p clip saw saw a few FPS improvements on the newer CPUs. The old Core i3 3770K Ivybridge was one of the biggest winners with its performance now 23% higher compared to dav1d 0.4. (Yes, dav1d particularly with 1080p content doesn't scale greatly and that's why the Ryzen 9 comes out ahead of the Threadripper, 7980XE behind the 9400F, etc due to favoring higher clock speeds.)
With 10-bit 1080p content the Ivybridge performance was unchanged but the performance on the newer CPUs was slightly higher.
The alternative "summer nature" scene was showing some nice improvements on the few tested CPUs with Dav1d 0.5.
4K decoding with dav1d 0.5 was also coming out slightly faster. More data on OpenBenchmarking.org or with the Phoronix Test Suite simply run phoronix-test-suite benchmark 1910123-PTS-DAV1DVID82 to compare your own system(s) between dav1d 0.4 vs. 0.5 as well as against the systems shown in this article.
The SSSE3 code path for dav1d is now upwards of 40% faster with the v0.5 release. There is also single digit improvements for the AVX2 code path and up to 10% performance improvements for 64-bit ARM. There are also VSX, SSE2, and SSE4 optimizations among the work in this latest release as well as some decoder fixes. Dav1d 0.5 can be found at VideoLAN.org.
Dav1d 0.5 was already updated on the Phoronix Test Suite / OpenBenchmarking.org so over the past day I began firing off some benchmarks on different systems. Of course, more tests to come in future articles but I ran some quick benchmarks for now with the old/new test profile for comparing the performance on the same systems.
Dav1d with the Chimera 1080p clip saw saw a few FPS improvements on the newer CPUs. The old Core i3 3770K Ivybridge was one of the biggest winners with its performance now 23% higher compared to dav1d 0.4. (Yes, dav1d particularly with 1080p content doesn't scale greatly and that's why the Ryzen 9 comes out ahead of the Threadripper, 7980XE behind the 9400F, etc due to favoring higher clock speeds.)
With 10-bit 1080p content the Ivybridge performance was unchanged but the performance on the newer CPUs was slightly higher.
The alternative "summer nature" scene was showing some nice improvements on the few tested CPUs with Dav1d 0.5.
4K decoding with dav1d 0.5 was also coming out slightly faster. More data on OpenBenchmarking.org or with the Phoronix Test Suite simply run phoronix-test-suite benchmark 1910123-PTS-DAV1DVID82 to compare your own system(s) between dav1d 0.4 vs. 0.5 as well as against the systems shown in this article.
16 Comments