Google Releases AOM-AV1 3.5 With More Speedups & Memory Optimizations

Written by Michael Larabel in Multimedia on 22 September 2022 at 05:33 AM EDT. 1 Comment
MULTIMEDIA --
Google engineers on Wednesday released AOM-AV1 3.5 as the newest version of their open-source AV1 video encoder. With AOM-AV1 3.5 comes yet more performance improvements as well as memory optimizations.

First up, AOM-AV1 3.5 supports frame parallel encode for larger number of threads. This is the "--fp-mt" option added in the prior release but at the time required a special build-time option. The FP-MT option is now available by default and this frame parallel multi-threading should help greatly with threading for this AV1 encoder.

And there is a whole lot of performance optimization work that went into the v3.5 release:
   * Speed-up multithreaded encoding for good quality mode for larger number of  threads through frame parallel encoding:
     - 30-34% encode time reduction for 1080p, 16 threads, 1x1 tile  configuration (tile_rows x tile_columns)
     - 18-28% encode time reduction for 1080p, 16 threads, 2x4 tile configuration
     - 18-20% encode time reduction for 2160p, 32 threads, 2x4 tile configuration
   * 16-20% speed-up for speed=6 to 8 in still-picture encoding mode
   * 5-6% heap memory reduction for speed=6 to 10 in real-time encoding mode
   * Improvements to the speed for speed=7, 8 in real-time encoding mode
   * Improvements to the speed for speed=9, 10 in real-time screen encoding  mode
   * Optimizations to improve multi-thread efficiency in real-time encoding mode
   * 10-15% speed up for SVC with temporal layers
   * SIMD optimizations:
     - Improve av1_quantize_fp_32x32_neon() 1.05x to 1.24x faster
     - Add aom_highbd_quantize_b{,_32x32,_64x64}_adaptive_neon() 3.15x to 5.6x faster than "C"
     - Improve av1_quantize_fp_64x64_neon() 1.17x to 1.66x faster
     - Add aom_quantize_b_avx2() 1.4x to 1.7x faster than aom_quantize_b_avx()
     - Add aom_quantize_b_32x32_avx2() 1.4x to 2.3x faster than aom_quantize_b_32x32_avx()
     - Add aom_quantize_b_64x64_avx2() 2.0x to 2.4x faster than aom_quantize_b_64x64_ssse3()
     - Add aom_highbd_quantize_b_32x32_avx2() 9.0x to 10.5x faster than aom_highbd_quantize_b_32x32_c()
     - Add aom_highbd_quantize_b_64x64_avx2() 7.3x to 9.7x faster than aom_highbd_quantize_b_64x64_c()
     - Improve aom_highbd_quantize_b_avx2() 1.07x to 1.20x faster
     - Improve av1_quantize_fp_avx2() 1.13x to 1.49x faster
     - Improve av1_quantize_fp_32x32_avx2() 1.07x to 1.54x faster
     - Improve av1_quantize_fp_64x64_avx2()  1.03x to 1.25x faster
     - Improve av1_quantize_lp_avx2() 1.07x to 1.16x faster

Plus bug fixes and other improvements make for this AOM-AV1 3.5 release to be quite exciting. The list of v3.5 changes can be found via this Git commit. I'll be working on some updated AV1 encode CPU benchmarks shortly.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week