Originally posted by schmidtbag
View Post
Announcement
Collapse
No announcement yet.
AMD Zen 4 AVX-512 Performance Analysis On The Ryzen 9 7950X
Collapse
X
-
-
Originally posted by schmidtbag View PostHaha fair enough - half-baked was more negative than I intended it to be, but I guess my point was it wasn't a "complete" AVX 512 implementation.
It's not natively 512-bits wide, but so what? All that matters is how it performs. As bridgman pointed out, Zen 4 has 6 FP dispatch ports, including 2x adds and 2x mul/mac. So, that's still 512-bits of mul/macs and 512-bits of adds you can do per cycle. As long as you've got enough work to keep the pipelines fed, I think the throughput is still pretty competitive.
- Likes 3
Leave a comment:
-
Originally posted by ms178 View PostRight but wasn't AVX-512 particularly better suited for more wide-spread use than other vector ISAs before it? I think the ispc creator blogged about that quite extensively, praising AVX-512 for its usefulness for general purpose compute tasks and performance per area advantages.
However, the degree of clock-throttling it caused in 14 nm Intel CPUs with the feature was very much at odds with using it for simple things like string-processing. This writeup captures the dilemma particularly well:
So, it really doesn't matter how easy-to-use it might be, the clock-throttling effects greatly limited its use in workloads other than those which heavily utilize it.
Originally posted by ms178 View PostWell, it took AMD more than half a decade to implement it
Originally posted by ms178 View Postand even though AMD was touting GPUs as better suited for vector code, that effort hasn't materialized yet and I am still waiting for them fulfilling their promises since 2012.
Sure, ROCm was in the wilderness for a long time, but they had legacy, proprietary drivers available for much of that time. They deserve some criticism for that, but it's not as if they weren't working on it the whole time.Last edited by coder; 26 September 2022, 10:45 PM.
Leave a comment:
-
Originally posted by bridgman View PostIt's not half-baked, it's half-sized and perfectly baked
In fairness we do also have multiple FP execution ports/pipes (6 vs 3 for Golden Cove) so there can still be a lot of work happening in parallel.
Leave a comment:
-
Originally posted by coder View PostAVX-512 really only helps in the minority of workloads, which is the main reason AMD dragged its feet for so long on it.
Leave a comment:
-
Originally posted by chuckula View PostSeriously, turn off AVX-512 in Phoronix's main review and watch Zen 4 suddenly not look that great compared to a year-old Alder lake.
AVX-512 really only helps in the minority of workloads, which is the main reason AMD dragged its feet for so long on it.
- Likes 2
Leave a comment:
-
"there is significant performance uplift to enjoy while no negative impact in terms of reduced CPU clock speeds / higher power consumption"
oooooh boy i couldnt wait to load the comment section and enjoy the obligatory 30 pages of "omgoptimized trollolol!!!" autism tears. come on lads, yall is slippin. where the tears at? 🤔
Leave a comment:
-
Originally posted by Sin2x View Post
You've been a fan of an instruction set? What's wrong with you?
Obligatory Linuses quote: https://www.realworldtech.com/forum/...rpostid=193190
As for Linus's opinions on hardware, he's not a hardware engineer and 20 years ago when he was writing emulators at Transmeta -- which is the closest he ever got to hardware -- it didn't work out too well. The fact that some AVX-512 instructions aren't intended to swizzle bits inside the kernel doesn't mean they have no value, and the very article you should have read and comprehended proves that point abundantly.
- Likes 1
Leave a comment:
-
Originally posted by coder View PostCalling it that implies there was an AVX3 and AVX4. No, it's called AVX-512. It's inconsistent, but the reality of AVX-512 is that it's a different family of ISA extensions, just like AVX/AVX2 are a different family than the SSE/SSE2/SSE3/SSE4x extensions. Each family differs from its predecessor in more ways than mere register width.
Leave a comment:
-
Originally posted by MadCatX View PostTrue, but AVX512 in very power-constrained laptop chips is probably worth less that an "AVX512-ready!" sticker you could slap on a laptop with such a chip.
Where AVX-512 got into trouble was in workloads that used it for around 10% - 20% of the instructions, which was enough to trigger significant downclocking but not enough that it could compensate with its greater throughput. I experienced this, first hand. When we recompiled with AVX-512 completely disabled, we got higher overall throughput in my software.
At 10 nm, the power & clock penalties of AVX-512 are definitely less than they were at 14 nm. That was shown with Ice Lake SP Xeons delivering more consistent clock speeds in AVX-512 -heavy workloads, without going outside their power envelopes (server CPUs are very good about adhering to their power limits).
Originally posted by MadCatX View PostIt's quite difficult to draw some exact conclusions but RKL's AVX512 seems worse than Zen4 at least in perf-per-watt. (https://www.phoronix.com/review/rocket-lake-avx512)
Originally posted by MadCatX View PostI'm aware of the technical intricacies here but nevertheless it's a step back. Unless Intel fixes this for Raptor Lake,Last edited by coder; 26 September 2022, 04:46 PM.
- Likes 2
Leave a comment:
Leave a comment: