AVX2 & AVX-512 Optimized Versions Of ARIA Cipher Coming With Linux 6.3
The ARIA block cipher devised by South Korean researchers is being sped up by AVX2 and AVX-512 for its Linux kernel implementation.
Queued this week into Herbert Xu's "cryptodev" Git code ahead of the Linux 6.3 cycle are the new optimized ARIA implementations.
The AVX2-optimized version of ARIA also depends upon AES-NI and GFNI instructions and can handle 32-way parallel processing for blocks. The results of the ARIA-AVX2 implementation are promising over the original AVX code with encrypting 1024 bytes dropping from 2701 to 2003 cycles or with 4096 bytes drops from 11,876 cycles to 7,295 cycles with the AVX2 implementation. On the decryption side there is also benefit with the new implementation dropping from 11,954 cycles to 7,564 cycles for decrypting 4096 bytes.
The ARIA-AVX2 implementation can be found here while it awaits upstreaming from cryptodev in the next kernel cycle.
Meanwhile the aria-avx512 implementation with AVX-512 and GFNI is very promising and supports 64-way parallel processing. The AVX-512 implementation is much better than the AVX2 implementation in turn dropping from 2,003 cycles down to 1,504 cycles for 1024 byte encryption or from 7,295 cycles down to 5,5540 cycles for encrypting 4096 bytes. There are also significant savings on the decryption side too with this AVX-512 implementation.
It was just a few months ago that the AVX implementation of ARIA was added to the Linux kernel while now the AVX2 and AVX-512 versions are ready for Linux 6.3 thanks to this work by Taehee Yoo for speeding up this cipher with modern AMD and Intel processors.
Queued this week into Herbert Xu's "cryptodev" Git code ahead of the Linux 6.3 cycle are the new optimized ARIA implementations.
The AVX2-optimized version of ARIA also depends upon AES-NI and GFNI instructions and can handle 32-way parallel processing for blocks. The results of the ARIA-AVX2 implementation are promising over the original AVX code with encrypting 1024 bytes dropping from 2701 to 2003 cycles or with 4096 bytes drops from 11,876 cycles to 7,295 cycles with the AVX2 implementation. On the decryption side there is also benefit with the new implementation dropping from 11,954 cycles to 7,564 cycles for decrypting 4096 bytes.
The ARIA-AVX2 implementation can be found here while it awaits upstreaming from cryptodev in the next kernel cycle.
Meanwhile the aria-avx512 implementation with AVX-512 and GFNI is very promising and supports 64-way parallel processing. The AVX-512 implementation is much better than the AVX2 implementation in turn dropping from 2,003 cycles down to 1,504 cycles for 1024 byte encryption or from 7,295 cycles down to 5,5540 cycles for encrypting 4096 bytes. There are also significant savings on the decryption side too with this AVX-512 implementation.
It was just a few months ago that the AVX implementation of ARIA was added to the Linux kernel while now the AVX2 and AVX-512 versions are ready for Linux 6.3 thanks to this work by Taehee Yoo for speeding up this cipher with modern AMD and Intel processors.
4 Comments