Since Ryzen had such great benchmarks on AES, I wonder what the new VAES set gives here? Is there a way to test that for Michael?
Announcement
Collapse
No announcement yet.
AMD Sends Out Patches Adding "Znver3" Support To GNU Binutils With New Instructions
Collapse
X
-
Originally posted by carewolf View Post
Ideally twice the number of bits processed per clock cycle.
I suppose it won't be twice the bandwidth though, but it should be interesting to see some numbers on it. Also, since vector instructions could temporally disable SMP locally (is that still true?), it might be only truly useful in non-server applications. Multicore benchmarks warranted!
Comment
-
Originally posted by ms178 View PostAVX-512 is supposed to come with Zen 4, hopefully with a better implementation than Intel's.
If just one library call executes just one AVX-512 instruction, suddenly every SSE and AVX operation now burns more power by virtue of having to always copy the upper 256-bits of each vector register. Of course, you could always terminate AVX-512 code blocks with VZEROUPPER, but that potentially limits its use in smaller functions.
ARM's SVE is a much better approach, if you really must have larger vectors. Better still would be to use a GPU or purpose-built AI accelerator.
- Likes 1
Comment
-
Originally posted by zxy_thf View PostZen 4's AVX-512 support might be light Zen 1's AVX2 support, i.e., emulating 512-bit operations with 256-bit ALUs.
However due to the tremendous cost of AVX-512 on die area, this approach might also be another "worst is better" solution.
Comment
-
Originally posted by carewolf View PostIdeally twice the number of bits processed per clock cycle.
While I was writing the post comparing the new Qualcomm server chip, Centriq, to our current stock of Intel Skylake-based Xeons, I noticed a disturbing phenomena.
If you do not require AVX-512 for some specific high performance tasks, I suggest you disable AVX-512 execution on your server or desktop, to avoid accidental AVX-512 throttling.
- Likes 1
Comment
-
Originally posted by coder View PostAt lower clock speeds, though! One developer found the impact on clock speed was so dramatic that using AVX-512 for crypto resulted in a net decrease of server throughput!
While I was writing the post comparing the new Qualcomm server chip, Centriq, to our current stock of Intel Skylake-based Xeons, I noticed a disturbing phenomena.
AMD managed to make AES extremely fast in Zen1 already, who knows what they pull off here?
Comment
-
Originally posted by coder View PostAt lower clock speeds, though! One developer found the impact on clock speed was so dramatic that using AVX-512 for crypto resulted in a net decrease of server throughput!
While I was writing the post comparing the new Qualcomm server chip, Centriq, to our current stock of Intel Skylake-based Xeons, I noticed a disturbing phenomena.
- Likes 1
Comment
Comment