Announcement

Collapse
No announcement yet.

AMD Zen 4 AVX-512 Performance Analysis On The Ryzen 9 7950X

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by coder View Post
    HSA was a nice dream, but it never gained the necessary industry momentum. I think some of its advantages still live on in the form of ROCm, which I believe was architected to support it. Perhaps bridgman can say more about that.

    BTW, OpenCL 2.0 has a feature called SVM (Shared Virtual Memory), which I believe is cache-coherent. Also, CXL supports cache-coherency at the interconnect protocol level.
    It wouldn't characterize it as a dream as it was a key selling point for their hardware for a long time and I am still waiting to see the vision they promised come to fruition eventually. HSA itself did not gain any industry traction, but at least some key technologies which furthers that vision are now standardized across the industry (e.g. CXL) and I hope that the software side also will get better with future language standards incorporating some key elements. While SVM in OpenCL supports coarse-grained and fine-grained virtual memory, the implementations that matter to the market only support the former, right? I haven't checked that in a long time though and I was under the impression that this limitation limits the practicability of that feature for some workloads quite a bit. I am also not aware of any commonly used software making use of that feature. But maybe you know some examples?

    Comment


    • #42
      coder MadCatX

      Here you go:

      AVX-512 was a hotly discussed topic around the launch of the new Intel Alder Lake CPUs. At first it was said that the P cores supported it in principle, but in

      Comment


      • #43
        Originally posted by bridgman View Post
        It's not half-baked, it's half-sized and perfectly baked
        In fairness we do also have multiple FP execution ports/pipes (6 vs 3 for Golden Cove) so there can still be a lot of work happening in parallel.
        for me its really funny that intel failed on AVX512 so many times and AMD just did it right.

        on other tasks like SGX intel failed to...

        and the intel ARC GPUs in my point of view failed to.

        really man intel would be doing better by just license the AMD version of AVX512 and just license RDNA3 design to ...

        same for apple... the Apple M1/M2 SOCs could be much better in my point of view if apple just license RDNA3,...

        i did read some benchmarks of the RDNA2 in Samsungs ARM SOCs vs Qualcomm old Adreno (old ATI gpu tech)

        and the RDNA2 license pay for itself both have the same performance and the same power consumtion but the RDNA samsung chip has much less tranistors and much higher clock speed its 1400mhz on the Samsung SOC and only like 750mhz on the Qualcomm SOC... this means the RDNA2 license pay for itself by the tranistor count alone.

        qualcomm could produce SOCs with the same performance with much less tranistors. also RDNA has more features like raytracing acceration hardware...

        this means we have smart companies like samsung who just get a RDNA2 license and then we have stupid companies like intel who fail on their own design...

        also if apple would license the RDNA3 design their linux support would instandly be much better because the opensource driver is done already.
        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • #44
          Would be good to get a test of explicit avx512 vs avx2 with x265 or Handbrake!

          Comment


          • #45
            Originally posted by Sin2x View Post

            You've been a fan of an instruction set? What's wrong with you?

            Obligatory Linuses quote: https://www.realworldtech.com/forum/...rpostid=193190
            Not everything Linus Torvalds says is gospel. There's lots of real world use cases where AVX-512 has huge benefits. Emulators like RPCS3 and Yuzu both benefit greatly from the use of AVX-512. Linus sees what benefits him and that's compiling kernel code. He can't see the forest between the trees.

            The PlayStation 3 emulator can make use of the extra-wide SIMD in AMD's new Zen 4 CPUs to see a significant speedup.

            Comment


            • #46
              Thanks for sharing. There's not a whole lot to go on, but these observations could be largely explained by Intel simply spending little/no time on frequency curve optimizations, when AVX-512 is enabled. Because, if enabling AVX-512 actually decreases average power consumption, then it must be getting clock-throttled more aggressively than the non- AVX-512 case.

              Comment


              • #47
                Originally posted by Dukenukemx View Post
                Linus sees what benefits him and that's compiling kernel code.
                Or traditional server apps, like web servers, databases, etc. Things which lean heavily on the kernel probably have disproportionate mind-share with him. Those are probably going to be multithreaded programs that use lots of memory and do extreme amounts of storage & network I/O.

                Comment


                • #48
                  Originally posted by Dukenukemx View Post
                  Not everything Linus Torvalds says is gospel. There's lots of real world use cases where AVX-512 has huge benefits. Emulators like RPCS3 and Yuzu both benefit greatly from the use of AVX-512. Linus sees what benefits him and that's compiling kernel code. He can't see the forest between the trees.
                  https://hothardware.com/news/rpcs3-d...s-with-avx-512
                  The PlayStation 3 emulator can make use of the extra-wide SIMD in AMD's new Zen 4 CPUs to see a significant speedup.

                  https://wccftech.com/amd-zen-4-avx-5...-xenia-vita3k/
                  No, it's you who can't even understand what he wrote -- that AVX512 is used in dispoportionately low percentage of tasks and the die area it uses could be put to better use for general computing. Which -- surprise -- Intel did by stripping this functionality from desktop processors and leaving it only on Xeons.

                  Don't ever presume you could be smarter than Linus, you only make yourself look like a clown.

                  Comment


                  • #49
                    Originally posted by Sin2x View Post
                    the die area it uses could be put to better use for general computing. Which -- surprise -- Intel did by stripping this functionality from desktop processors and leaving it only on Xeons.
                    Um... I think Linus is probably at least as interested in server CPUs, here.

                    Not only that, but Intel actually did put AVX-512 into Alder Lake desktop/mobile chips, they just disabled it because the E-cores didn't have it and they didn't want to deal with the headaches of asymmetric instruction support. That means it's still using the extra die area.

                    Originally posted by Sin2x View Post
                    Don't ever presume you could be smarter than Linus, you only make yourself look like a clown.
                    It's not a matter of intelligence. He's neither omniscient nor unbiased. Furthermore, he's not a chip designer and he doesn't actually know as much about Intel's customers as Intel does. All of this makes me take his opinions on CPU architecture with a bit of salt.

                    That said, I've long been critical of AVX-512, or at least the aspect of it which involves widening vectors to 512-bit. Other things, like predication and scatter/gather, are indeed nice and maybe not hugely expensive in die area.

                    I'm a little bit critical of scatter/gather, just because I think it lulls programmers into thinking they don't need to worry about data layout. However, even having the CPU fetch & interleave your data doesn't mean you don't have to worry about things like cache thrashing.

                    Comment


                    • #50
                      Originally posted by coder View Post
                      There's not a whole lot to go on, but these observations could be largely explained by Intel simply spending little/no time on frequency curve optimizations, when AVX-512 is enabled. Because, if enabling AVX-512 actually decreases average power consumption, then it must be getting clock-throttled more aggressively than the non- AVX-512 case.
                      Nah, as there was no throtteling involved it rather means that Intel finally managed to optimize AVX-512 to be more power efficient. Yeah, we all have to throw away old wisdoms about AVX-512 as the old equation "AVX-512 usage = higher power draw" is no longer true. Buildzoid backed that up, too, with his own data in one of his videos.

                      Comment

                      Working...
                      X