Since Ryzen had such great benchmarks on AES, I wonder what the new VAES set gives here? Is there a way to test that for Michael?
Announcement
Collapse
No announcement yet.
AMD Sends Out Patches Adding "Znver3" Support To GNU Binutils With New Instructions
Collapse
X
-
Originally posted by phoronix View PostPhoronix: AMD Sends Out Patches Adding "Znver3" Support To GNU Binutils With New Instructions
One of AMD's compiler experts this week sent out a patch wiring up Zen 3 support in the important GNU Binutils collection for Linux systems...
http://www.phoronix.com/scan.php?pag...nutils-Support
- Likes 1
Comment
-
Originally posted by ms178 View PostI guess you meant Bulldozer? As far as I know Zen 1's AVX2 implementation was up to par with Intel's. If they implemented AVX-512 like that, it wouldn't be that beneficial after all, would it? I am not an ISA expert, but having larger vector units and fewer cycles for its instructions are what the performance comes from?! And from looking in the past of that approach showed that they were lacking behind in AVX performance quite a bit due to their implementation. Not that it mattered too much at that time as AVX2 wasn't that important at that time, but it might matter now if they want to go after Intel in AI, HPC workloads where AVX-512 is fully utilized. And with the x86-64-v4 target, it probably will get used soon more widely at least on Linux (Does anyone know if these new baselines will translate over into the Windows world? I'd love to see such a Windows version).
PS: Thanks for mentioning the new x86-64-v4 target.
Comment
-
Originally posted by carewolf View Post
Ideally twice the number of bits processed per clock cycle.
I suppose it won't be twice the bandwidth though, but it should be interesting to see some numbers on it. Also, since vector instructions could temporally disable SMP locally (is that still true?), it might be only truly useful in non-server applications. Multicore benchmarks warranted!
Comment
-
Originally posted by ms178 View PostAVX-512 is supposed to come with Zen 4, hopefully with a better implementation than Intel's.
If just one library call executes just one AVX-512 instruction, suddenly every SSE and AVX operation now burns more power by virtue of having to always copy the upper 256-bits of each vector register. Of course, you could always terminate AVX-512 code blocks with VZEROUPPER, but that potentially limits its use in smaller functions.
ARM's SVE is a much better approach, if you really must have larger vectors. Better still would be to use a GPU or purpose-built AI accelerator.
- Likes 1
Comment
-
Originally posted by zxy_thf View PostZen 4's AVX-512 support might be light Zen 1's AVX2 support, i.e., emulating 512-bit operations with 256-bit ALUs.
However due to the tremendous cost of AVX-512 on die area, this approach might also be another "worst is better" solution.
Comment
-
Originally posted by carewolf View PostIdeally twice the number of bits processed per clock cycle.
https://blog.cloudflare.com/on-the-d...uency-scaling/
If you do not require AVX-512 for some specific high performance tasks, I suggest you disable AVX-512 execution on your server or desktop, to avoid accidental AVX-512 throttling.
- Likes 1
Comment
-
Originally posted by coder View PostAt lower clock speeds, though! One developer found the impact on clock speed was so dramatic that using AVX-512 for crypto resulted in a net decrease of server throughput!
https://blog.cloudflare.com/on-the-d...uency-scaling/
AMD managed to make AES extremely fast in Zen1 already, who knows what they pull off here?
Comment
-
Originally posted by coder View PostAt lower clock speeds, though! One developer found the impact on clock speed was so dramatic that using AVX-512 for crypto resulted in a net decrease of server throughput!
https://blog.cloudflare.com/on-the-d...uency-scaling/
- Likes 1
Comment
Comment