Google Releases AOM-AV1 3.5 With More Speedups & Memory Optimizations

First up, AOM-AV1 3.5 supports frame parallel encode for larger number of threads. This is the "--fp-mt" option added in the prior release but at the time required a special build-time option. The FP-MT option is now available by default and this frame parallel multi-threading should help greatly with threading for this AV1 encoder.
And there is a whole lot of performance optimization work that went into the v3.5 release:
* Speed-up multithreaded encoding for good quality mode for larger number of threads through frame parallel encoding:
- 30-34% encode time reduction for 1080p, 16 threads, 1x1 tile configuration (tile_rows x tile_columns)
- 18-28% encode time reduction for 1080p, 16 threads, 2x4 tile configuration
- 18-20% encode time reduction for 2160p, 32 threads, 2x4 tile configuration
* 16-20% speed-up for speed=6 to 8 in still-picture encoding mode
* 5-6% heap memory reduction for speed=6 to 10 in real-time encoding mode
* Improvements to the speed for speed=7, 8 in real-time encoding mode
* Improvements to the speed for speed=9, 10 in real-time screen encoding mode
* Optimizations to improve multi-thread efficiency in real-time encoding mode
* 10-15% speed up for SVC with temporal layers
* SIMD optimizations:
- Improve av1_quantize_fp_32x32_neon() 1.05x to 1.24x faster
- Add aom_highbd_quantize_b{,_32x32,_64x64}_adaptive_neon() 3.15x to 5.6x faster than "C"
- Improve av1_quantize_fp_64x64_neon() 1.17x to 1.66x faster
- Add aom_quantize_b_avx2() 1.4x to 1.7x faster than aom_quantize_b_avx()
- Add aom_quantize_b_32x32_avx2() 1.4x to 2.3x faster than aom_quantize_b_32x32_avx()
- Add aom_quantize_b_64x64_avx2() 2.0x to 2.4x faster than aom_quantize_b_64x64_ssse3()
- Add aom_highbd_quantize_b_32x32_avx2() 9.0x to 10.5x faster than aom_highbd_quantize_b_32x32_c()
- Add aom_highbd_quantize_b_64x64_avx2() 7.3x to 9.7x faster than aom_highbd_quantize_b_64x64_c()
- Improve aom_highbd_quantize_b_avx2() 1.07x to 1.20x faster
- Improve av1_quantize_fp_avx2() 1.13x to 1.49x faster
- Improve av1_quantize_fp_32x32_avx2() 1.07x to 1.54x faster
- Improve av1_quantize_fp_64x64_avx2() 1.03x to 1.25x faster
- Improve av1_quantize_lp_avx2() 1.07x to 1.16x faster
Plus bug fixes and other improvements make for this AOM-AV1 3.5 release to be quite exciting. The list of v3.5 changes can be found via this Git commit. I'll be working on some updated AV1 encode CPU benchmarks shortly.
2 Comments