Announcement

Collapse
No announcement yet.

AMD Zen 4 AVX-512 Performance Analysis On The Ryzen 9 7950X

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Originally posted by ms178 View Post
    Well, I'd say we should not mix too many questions together such as, "Is AVX-512 more power effective on Alder Lake?" with "Is AMD's AVX-512 implementation better than Alder Lake's?"
    As performance is directly linked to power consumption, it seems foolish to address an anomaly in one while ignoring an anomaly in the other.

    Originally posted by ms178 View Post
    "... indeed, with the feature enabled, the efficiency of the P-cores is significantly higher in all benchmarks than without."
    News flash: that was always true. AVX-512 always increased performance in vector-heavy workloads by more than it increased power consumption. Hence, improved efficiency. So, that really doesn't address the key questions.

    Originally posted by ms178 View Post
    Igor also measured his power numbers with special hardware which should provide a more accurate picture than standard software tools.
    What's your point? Other reviewers have tested other CPUs with hardware measurements, as well.

    Originally posted by ms178 View Post
    His testing was also done on Windows in December 2021, wheras Michael did his testing on Linux in September 2022 and with different tests involved.
    If you're addressing the Zen 4 performance discrepancy, I agree that we have very little data from Igor's testing. However, I think we've seen prior data which supports that Y-cruncher responds well to AVX-512. At least, better than what Igor measured. I'll try to confirm that.

    BTW, the link I added to my prior post about Michael's AVX-512 testing of Alder Lake was done in November 2021. So, every bit as legitimate as Igor's. You should check it out!

    Leave a comment:


  • ms178
    replied
    Originally posted by coder View Post
    There's a fundamental inconsistency, though. Igor showed a performance advantage for AVX-512 of a mere 13.0 % (Y-cruncher) and 9.4% (LynX).

    Compare that to the GeoMean from this article, where AMD got a whopping 59.0% performance advantage in AVX-512 mode, with what should be a narrower implementation (1x FMA vs. 2x and higher latency for at least some instructions).

    Curious to hear your explanations of that.
    Well, I'd say we should not mix too many questions together such as, "Is AVX-512 more power effective on Alder Lake?" with "Is AMD's AVX-512 implementation better than Alder Lake's?"

    My statement was focused on the former. From Igor's article: "[...] the trigger for today’s article was the literally incredibly low power consumption of the P-cores with AVX-512 and the question whether the efficiency is actually higher. And indeed, with the feature enabled, the efficiency of the P-cores is significantly higher in all benchmarks than without. In fact, the results are so clear that the instruction set can and should always be safely activated – if possible, of course.​"

    Igor also measured his power numbers with special hardware which should provide a more accurate picture than standard software tools. His testing was also done on Windows in December 2021, wheras Michael did his testing on Linux in September 2022 and with different tests involved. The variables at play here are too many to compare numbers from Igor's and Michael's tests to each other, as the geomean of Michael's testing also includes different benchmarks. Michael could do an AVX-512-focused comparison between Alder Lake and Zen 4 with power numbers. But as Alder Lake is no longer shipping with AVX-512 enabled, the new HEDT-platforms from both AMD and Intel should be better suited for such a comparison.

    Leave a comment:


  • coder
    replied
    Originally posted by ms178 View Post
    Nah, as there was no throtteling involved it rather means that Intel finally managed to optimize AVX-512 to be more power efficient. Yeah, we all have to throw away old wisdoms about AVX-512 as the old equation "AVX-512 usage = higher power draw" is no longer true.
    There's a fundamental inconsistency, though. Igor showed a performance advantage for AVX-512 of a mere 13.0 % (Y-cruncher) and 9.4% (LynX).

    Compare that to the GeoMean from this article, where AMD got a whopping 59.0% performance advantage in AVX-512 mode, with what should be a narrower implementation (1x FMA vs. 2x and higher latency for at least some instructions).

    Curious to hear your explanations of that.


    Edit: Michael 's CpuMiner testing of AVX-512 on Alder Lake showed much bigger gains, often accompanied by higher power consumption (though not insane). I wish he'd tested a few more apps, so we could get a similar picture as we now have for Zen 4.

    Last edited by coder; 29 September 2022, 04:33 PM.

    Leave a comment:


  • ms178
    replied
    Originally posted by coder View Post
    There's not a whole lot to go on, but these observations could be largely explained by Intel simply spending little/no time on frequency curve optimizations, when AVX-512 is enabled. Because, if enabling AVX-512 actually decreases average power consumption, then it must be getting clock-throttled more aggressively than the non- AVX-512 case.
    Nah, as there was no throtteling involved it rather means that Intel finally managed to optimize AVX-512 to be more power efficient. Yeah, we all have to throw away old wisdoms about AVX-512 as the old equation "AVX-512 usage = higher power draw" is no longer true. Buildzoid backed that up, too, with his own data in one of his videos.

    Leave a comment:


  • coder
    replied
    Originally posted by Sin2x View Post
    the die area it uses could be put to better use for general computing. Which -- surprise -- Intel did by stripping this functionality from desktop processors and leaving it only on Xeons.
    Um... I think Linus is probably at least as interested in server CPUs, here.

    Not only that, but Intel actually did put AVX-512 into Alder Lake desktop/mobile chips, they just disabled it because the E-cores didn't have it and they didn't want to deal with the headaches of asymmetric instruction support. That means it's still using the extra die area.

    Originally posted by Sin2x View Post
    Don't ever presume you could be smarter than Linus, you only make yourself look like a clown.
    It's not a matter of intelligence. He's neither omniscient nor unbiased. Furthermore, he's not a chip designer and he doesn't actually know as much about Intel's customers as Intel does. All of this makes me take his opinions on CPU architecture with a bit of salt.

    That said, I've long been critical of AVX-512, or at least the aspect of it which involves widening vectors to 512-bit. Other things, like predication and scatter/gather, are indeed nice and maybe not hugely expensive in die area.

    I'm a little bit critical of scatter/gather, just because I think it lulls programmers into thinking they don't need to worry about data layout. However, even having the CPU fetch & interleave your data doesn't mean you don't have to worry about things like cache thrashing.

    Leave a comment:


  • Sin2x
    replied
    Originally posted by Dukenukemx View Post
    Not everything Linus Torvalds says is gospel. There's lots of real world use cases where AVX-512 has huge benefits. Emulators like RPCS3 and Yuzu both benefit greatly from the use of AVX-512. Linus sees what benefits him and that's compiling kernel code. He can't see the forest between the trees.
    https://hothardware.com/news/rpcs3-d...s-with-avx-512
    The PlayStation 3 emulator can make use of the extra-wide SIMD in AMD's new Zen 4 CPUs to see a significant speedup.

    https://wccftech.com/amd-zen-4-avx-5...-xenia-vita3k/
    No, it's you who can't even understand what he wrote -- that AVX512 is used in dispoportionately low percentage of tasks and the die area it uses could be put to better use for general computing. Which -- surprise -- Intel did by stripping this functionality from desktop processors and leaving it only on Xeons.

    Don't ever presume you could be smarter than Linus, you only make yourself look like a clown.

    Leave a comment:


  • coder
    replied
    Originally posted by Dukenukemx View Post
    Linus sees what benefits him and that's compiling kernel code.
    Or traditional server apps, like web servers, databases, etc. Things which lean heavily on the kernel probably have disproportionate mind-share with him. Those are probably going to be multithreaded programs that use lots of memory and do extreme amounts of storage & network I/O.

    Leave a comment:


  • coder
    replied
    Thanks for sharing. There's not a whole lot to go on, but these observations could be largely explained by Intel simply spending little/no time on frequency curve optimizations, when AVX-512 is enabled. Because, if enabling AVX-512 actually decreases average power consumption, then it must be getting clock-throttled more aggressively than the non- AVX-512 case.

    Leave a comment:


  • Dukenukemx
    replied
    Originally posted by Sin2x View Post

    You've been a fan of an instruction set? What's wrong with you?

    Obligatory Linuses quote: https://www.realworldtech.com/forum/...rpostid=193190
    Not everything Linus Torvalds says is gospel. There's lots of real world use cases where AVX-512 has huge benefits. Emulators like RPCS3 and Yuzu both benefit greatly from the use of AVX-512. Linus sees what benefits him and that's compiling kernel code. He can't see the forest between the trees.

    The PlayStation 3 emulator can make use of the extra-wide SIMD in AMD's new Zen 4 CPUs to see a significant speedup.

    Leave a comment:


  • DanglingPointer
    replied
    Would be good to get a test of explicit avx512 vs avx2 with x265 or Handbrake!

    Leave a comment:

Working...
X