The AVX-512 Performance Advantage With AMD EPYC Bergamo

Written by Michael Larabel in Processors on 26 July 2023 at 03:30 PM EDT. Page 3 of 5. 85 Comments.
OpenVKL benchmark with settings of Benchmark: vklBenchmark ISPC. AVX512 On was the fastest.
OSPRay benchmark with settings of Benchmark: gravity_spheres_volume/dim_512/ao/real_time. AVX512 On was the fastest.
OSPRay benchmark with settings of Benchmark: gravity_spheres_volume/dim_512/scivis/real_time. AVX512 On was the fastest.
OSPRay benchmark with settings of Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time. AVX512 On was the fastest.
oneDNN benchmark with settings of Harness: Recurrent Neural Network Training, Data Type: bf16bf16bf16, Engine: CPU. AVX512 On was the fastest.

Like with AMD EPYC Genoa(X), AVX-512 on Bergamo was proving beneficial in HPC and other workloads optimized for AVX-512.

Cpuminer-Opt benchmark with settings of Algorithm: x25x. AVX512 On was the fastest.
Cpuminer-Opt benchmark with settings of Algorithm: scrypt. AVX512 On was the fastest.
Cpuminer-Opt benchmark with settings of Algorithm: Blake-2 S. AVX512 On was the fastest.
Cpuminer-Opt benchmark with settings of Algorithm: LBC, LBRY Credits. AVX512 On was the fastest.
Cpuminer-Opt benchmark with settings of Algorithm: Quad SHA-256, Pyrite. AVX512 On was the fastest.

While Sierra Forest will offer up to 144 E cores, Bergamo only goes up to 128 cores but with AVX-512 and other Zen 4C features it may prove more than capable of competing, assuming AVX10 isn't making it to Sierra Forest.

Cpuminer-Opt benchmark with settings of Algorithm: Quad SHA-256, Pyrite. AVX512 On was the fastest.
Cpuminer-Opt benchmark with settings of Algorithm: Quad SHA-256, Pyrite. AVX512 On was the fastest.

Cpuminer-opt did observe higher power consumption of the EPYC 9754 when using AVX-512, but the increase in power was easily justified by the performance boost.

TensorFlow benchmark with settings of Device: CPU, Batch Size: 16, Model: AlexNet. AVX512 On was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 512, Model: AlexNet. AVX512 On was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 16, Model: GoogLeNet. AVX512 On was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 32, Model: ResNet-50. AVX512 On was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 256, Model: ResNet-50. AVX512 On was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 512, Model: ResNet-50. AVX512 On was the fastest.

As we are used to seeing, AVX-512 absolutely pays off for TensorFlow.

TensorFlow benchmark with settings of Device: CPU, Batch Size: 512, Model: ResNet-50. AVX512 On was the fastest.

There was a big increase in CPU power consumption for TensorFlow when using AVX-512 to yield these big speed-ups.


Related Articles