Originally posted by coder
View Post
Announcement
Collapse
No announcement yet.
AVX / AVX2 / AVX-512 Performance + Power On Intel Rocket Lake
Collapse
X
-
Last edited by carewolf; 08 April 2021, 02:57 AM.
- Likes 3
-
Originally posted by Etherman View PostI wonder how much silicon avx 512 uses. Is it comparable to an extra core or two with avx 2?
However, more instructions have been added since then, so I expect the overhead has gone up from that, somewhat. Still, I think die size is one of the lesser issues with it.Last edited by coder; 08 April 2021, 08:49 AM.
- Likes 1
Leave a comment:
-
Originally posted by carewolf View PostThough the -march flags aren't ignored, they are overriden for the files with the special code. You can't use the intrinsics without right archs.
I got sick of doing register allocation like 2 decades ago, however. Intrinsics are fine for me. I sometimes even check the compiler output (with intrinsics) and usually find it's as good or better than the asm I'd write by hand.
Leave a comment:
-
Originally posted by ddriver View PostI don't think it is that. The thing is so far SIMD units have been fairly general purpose. Intel is cramming a bunch of highly purpose specific stuff into avx 512. His is not a problem with the width of execution or power efficiency, but with the support hell it is to keep introducing new niche use instructions and having no instruction set and features uniformity between platforms.
Here's his full statement: https://www.phoronix.com/scan.php?pa...lds-On-AVX-512
Originally posted by ddriver View PostIntel appears to have given up on improving general purpose performance
- Likes 2
Leave a comment:
-
Originally posted by mle86pho View PostAt least for dav1d I'm suspicious if the benchmark measured something meaningful.
If you look into the sourcode, you'll notice, that there's a ton of handwritten assembler code (including AVX512):
And there's is code to directly decode the cpuid to determine the available vector instructions. I guess that setting the usual -march/.. compiler settings are pointless, they are used anyway.
Leave a comment:
-
While it does drive up power consumption and in some cases can be detrimental to the performance due to the clock speed differences when engaging AVX-512
I have seen it on older 14 nm CPUs, for sure, but I don't see any of your benchmarks where AVX-512 actually hurts raw performance (only perf/W).
Leave a comment:
-
Originally posted by piotrj3 View PostThe biggest problem of AVX-512 is that you need handcrafted program for it, and you need processor with preferably at least 2 units of AVX-512.
This reminds me of another Intel innovation: Itanium. Crap performance, poor software ecosystem, if only we had better compilers and better software to take advantage of it, then it would be super duper awesome!!111 Mmmkay.Last edited by torsionbar28; 07 April 2021, 10:47 PM.
- Likes 2
Leave a comment:
-
Originally posted by GPSnoopy View Post
Intel AVX is not faster. It's really not. Intel MKL by default does not use AVX on AMD CPUs, it falls back to something like SSE2 or SSE4. Quite slow.
You need to binary patch Intel MKL binaries to be able to bypass this rather arbitrary limitation. See:
- https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html
- https://www.extremetech.com/computin...eadripper-cpus
- http://www.swallowtail.org/naughty-intel.shtmlLast edited by vegabook; 07 April 2021, 06:49 PM.
- Likes 1
Leave a comment:
-
Originally posted by torsionbar28 View PostThis is the oft cited answer for what AVX-512 is useful for, yet nobody has any examples of this usage benefiting from AVX-512. Do you have any real world examples? Actual software products that use it, and specific workloads where there is a demonstrated benefit? These seem to always be missing from these AVX 512 discussions. IMO the benefit of AVX 512 seems far more theoretical than practical at this point.
What is the intended way? Can you quantify the benefits? Where can I go to see this benefit demonstrated?
Edit: AVX-512 feels like the CPU instruction equivalent of an herbal supplement, with promises of increased vitality, improved clarity, and stronger constitution. Not FDA approved. Not intended to treat or cure any disease. Consult your doctor before taking. Results not guaranteed. Advertisement contains paid actors. Batteries not included. Void where prohibited. Not for sale in ME, TX, CA, NY, or NJ.
AVX256 you have mostly 2 units in desktop chips, so often you end up in situation where AVX256 2 instructions are done in one cycle vs 1 instruction of AVX512, so gain mostly happens only if 2 AVX-256 instructions can't be used at the same time, and if AVX512 can reduce complexity of algorithm.
The biggest problem of AVX-512 is that you need handcrafted program for it, and you need processor with preferably at least 2 units of AVX-512.Last edited by piotrj3; 07 April 2021, 05:48 PM.
- Likes 1
Leave a comment:
-
Originally posted by TemplarGR View PostSeems the benefit of AVX in general is too small. Really, really, small. Unless specifically written for it.
Now if your code is in a vectorizable state a recompile will show some nice gains but for maximum performance, yeah specific implementations are best simply because the developer is the one that understand that code and can go further than the compiler's safe approach.
Think of it as regular C/C++ and Co are akin to OpenGL/DX11 and SIMD C/C++ and Co is akin to Vulkan/DX12.
Also SIMD regardless of the platform/acronym do a hell of lot more than simple math as claimed by some other posts, sure BLAS and other software make of use of the "Math" side of SIMD but it also greatly speedup memory, cache, shifting, comparison, crypto (not only aes-ni do this btw), etc. operations and those can be used on any app/library as long as you understand what you are doing.
Leave a comment:
Leave a comment: