Announcement

Collapse
No announcement yet.

AVX-512 Performance With 256-bit vs. 512-bit Data Path For AMD EPYC 9005 CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AVX-512 Performance With 256-bit vs. 512-bit Data Path For AMD EPYC 9005 CPUs

    Phoronix: AVX-512 Performance With 256-bit vs. 512-bit Data Path For AMD EPYC 9005 CPUs

    Now past the launch day for the AMD EPYC 9005 series server processors and having delivered initial AMD EPYC Zen 5 benchmarks for the EPYC 9575F / EPYC 9755 / EPYC 9965 SKUs, it's onto one of my favorite areas of testing and that is the more focused benchmarks looking at different specific changes/features of new processors. Today under the benchmarking microscope is looking at the new AVX-512 512-bit data path capabilities of 5th Gen AMD EPYC compared to using a 256-bit data path or disabling AVX-512 entirely.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    It should be kept in mind that the speedups observed when enabling 256-bit or 512-bit AVX-512 are only an inferior limit to the speedups possible for a given application with AVX-512.

    The reason is that there are many cases when an application is limited by the computations in the CPU core when not using AVX-512, but it becomes limited by the memory bandwidth when enabling AVX-512, either already when enabling 256-bit AVX-512 or only when enabling 512-bit AVX-512.

    This means that for some applications one could have seen even bigger speedup factors from enabling AVX-512, or only when enabling 512-bit AVX-512, had the tests been run on a computer with faster memory, which would have eliminated the memory bottleneck, allowing the cores to compute at full speed.
    Last edited by AdrianBc; 11 October 2024, 11:10 AM.

    Comment


    • #3
      Amd was very smart to go for the double the pump 256 bit avx512 implementation, it got most of advantage at half the cost. Full 512 bit is a diminished return.

      Maybe it can be useful when there's more SIMD resource contention in multi-threaded scenarios?

      Comment


      • #4
        so much area cost for this...
        at least it doesnt tank clocks

        Comment


        • #5
          Originally posted by AdrianBc View Post
          It should be kept in mind that the speedups observed when enabling 256-bit or 512-bit AVX-512 are only an inferior limit to the speedups possible for a given application with AVX-512.

          The reason is that there are many cases when an application is limited by the computations in the CPU core when not using AVX-512, but it becomes limited by the memory bandwidth when enabling AVX-512, either already when enabling 256-bit AVX-512 or only when enabling 512-bit AVX-512.

          This means that for some applications one could have seen even bigger speedup factors from enabling AVX-512, or only when enabling 512-bit AVX-512, had the tests been run on a computer with faster memory, which would have eliminated the memory bottleneck, allowing the cores to compute at full speed.
          Do I hear the sound of MR-DIMM 17600 in Zen 6's future?

          Comment


          • #6
            Originally posted by geerge View Post

            Do I hear the sound of MR-DIMM 17600 in Zen 6's future?
            I wrote in another thread: I want an amd cpu with minimum 64GB HBM, no socketed ram That thing would cost an arm and a leg, but also fly

            Comment


            • #7
              Originally posted by mlau View Post
              I wrote in another thread: I want an amd cpu with minimum 64GB HBM, no socketed ram That thing would cost an arm and a leg, but also fly
              Check out their MI300A lineup. The only downside is I think they top out at 24 CPU cores.

              AMD is definitely going in this direction with pure CPUs, as well. It'll just take a couple more generations for them to get there.

              Comment


              • #8
                Originally posted by AdrianBc View Post
                This means that for some applications one could have seen even bigger speedup factors from enabling AVX-512, or only when enabling 512-bit AVX-512, had the tests been run on a computer with faster memory, which would have eliminated the memory bottleneck, allowing the cores to compute at full speed.
                It should be relatively easy to test, if Michael would repeat some of these tests with half and 75% of the cores taken offline. It's a cheap way of simulating what it might be like to have even more memory bandwidth. We should expect to see the gap between 256-bit and 512-bit grow, as core counts decrease.

                BTW, if the testing is done with SMT enabled, then care should be taken to disable cores by taking both SMT siblings offline.

                Comment


                • #9
                  Originally posted by geerge View Post

                  Do I hear the sound of MR-DIMM 17600 in Zen 6's future?
                  I don't think this is what you meant, but I hope we don't literally hear it. Actively cooled RAM is not on my wish list.

                  Comment


                  • #10
                    I have not seen any solid memory scaling AVX512 benchmarks which would confirm this theory that ZEN5 AVX512 is limited by memory bandwidth in any meaningful margin. Someone wrote this speculation after ZEN4->ZEN5 AVX512 comparison and AFAIK it somehow became a "given truth" without any decent research.

                    Comment

                    Working...
                    X