Google Releases AOM-AV1 3.5 With More Speedups & Memory Optimizations
Google engineers on Wednesday released AOM-AV1 3.5 as the newest version of their open-source AV1 video encoder. With AOM-AV1 3.5 comes yet more performance improvements as well as memory optimizations.
First up, AOM-AV1 3.5 supports frame parallel encode for larger number of threads. This is the "--fp-mt" option added in the prior release but at the time required a special build-time option. The FP-MT option is now available by default and this frame parallel multi-threading should help greatly with threading for this AV1 encoder.
And there is a whole lot of performance optimization work that went into the v3.5 release:
Plus bug fixes and other improvements make for this AOM-AV1 3.5 release to be quite exciting. The list of v3.5 changes can be found via this Git commit. I'll be working on some updated AV1 encode CPU benchmarks shortly.
First up, AOM-AV1 3.5 supports frame parallel encode for larger number of threads. This is the "--fp-mt" option added in the prior release but at the time required a special build-time option. The FP-MT option is now available by default and this frame parallel multi-threading should help greatly with threading for this AV1 encoder.
And there is a whole lot of performance optimization work that went into the v3.5 release:
* Speed-up multithreaded encoding for good quality mode for larger number of threads through frame parallel encoding:
- 30-34% encode time reduction for 1080p, 16 threads, 1x1 tile configuration (tile_rows x tile_columns)
- 18-28% encode time reduction for 1080p, 16 threads, 2x4 tile configuration
- 18-20% encode time reduction for 2160p, 32 threads, 2x4 tile configuration
* 16-20% speed-up for speed=6 to 8 in still-picture encoding mode
* 5-6% heap memory reduction for speed=6 to 10 in real-time encoding mode
* Improvements to the speed for speed=7, 8 in real-time encoding mode
* Improvements to the speed for speed=9, 10 in real-time screen encoding mode
* Optimizations to improve multi-thread efficiency in real-time encoding mode
* 10-15% speed up for SVC with temporal layers
* SIMD optimizations:
- Improve av1_quantize_fp_32x32_neon() 1.05x to 1.24x faster
- Add aom_highbd_quantize_b{,_32x32,_64x64}_adaptive_neon() 3.15x to 5.6x faster than "C"
- Improve av1_quantize_fp_64x64_neon() 1.17x to 1.66x faster
- Add aom_quantize_b_avx2() 1.4x to 1.7x faster than aom_quantize_b_avx()
- Add aom_quantize_b_32x32_avx2() 1.4x to 2.3x faster than aom_quantize_b_32x32_avx()
- Add aom_quantize_b_64x64_avx2() 2.0x to 2.4x faster than aom_quantize_b_64x64_ssse3()
- Add aom_highbd_quantize_b_32x32_avx2() 9.0x to 10.5x faster than aom_highbd_quantize_b_32x32_c()
- Add aom_highbd_quantize_b_64x64_avx2() 7.3x to 9.7x faster than aom_highbd_quantize_b_64x64_c()
- Improve aom_highbd_quantize_b_avx2() 1.07x to 1.20x faster
- Improve av1_quantize_fp_avx2() 1.13x to 1.49x faster
- Improve av1_quantize_fp_32x32_avx2() 1.07x to 1.54x faster
- Improve av1_quantize_fp_64x64_avx2() 1.03x to 1.25x faster
- Improve av1_quantize_lp_avx2() 1.07x to 1.16x faster
Plus bug fixes and other improvements make for this AOM-AV1 3.5 release to be quite exciting. The list of v3.5 changes can be found via this Git commit. I'll be working on some updated AV1 encode CPU benchmarks shortly.
2 Comments