Announcement

Collapse
No announcement yet.

AMD EPYC 9655 Benchmarks Show The Terrific Generational Gains With 5th Gen EPYC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD EPYC 9655 Benchmarks Show The Terrific Generational Gains With 5th Gen EPYC

    Phoronix: AMD EPYC 9655 Benchmarks Show The Terrific Generational Gains With 5th Gen EPYC

    With the AMD EPYC 9005 "Turin" series launch earlier this month there was launch-day benchmark review results for the EPYC 9575F, EPYC 9755, and EPYC 9965 processors in looking at that frequency optimized SKU, the new flagship 128-core Turin "classic" core model, and the new flagship 192-core Turin "dense" core SKU, respectively. That's interesting for looking at the new 5th Gen AMD EPYC top-end wares but in comparing to 4th Gen EPYC also means higher core counts at the top-end. In being curious about the core-for-core advantages of 5th Gen EPYC, I managed to get my hands on the AMD EPYC 9655 processors for seeing how that model compares to the prior AMD EPYC 9654 "Genoa" flagship model. Here's a look today at how the AMD EPYC 9655 1P/2P 96-core processor compares to the prior EPYC 9654 flagship.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Hi Michael, as a generic remark, would it be possible to show core count next to cpu codenames in future charts/articles? It's totally unclear for me without cross checking all of those cpu names with their configs all the time. Takes a lot of effort on the reader side and makes me often spend less time on the articles/ results then I would want, cause the charts don't say much to me without that info. That's a shame since it would be really interesting to know and understand the results.

    Comment


    • #3
      Originally posted by peterdk View Post
      Hi Michael, as a generic remark, would it be possible to show core count next to cpu codenames in future charts/articles? It's totally unclear for me without cross checking all of those cpu names with their configs all the time. Takes a lot of effort on the reader side and makes me often spend less time on the articles/ results then I would want, cause the charts don't say much to me without that info. That's a shame since it would be really interesting to know and understand the results.
      With it all being automated, yes, if I had the time/resources to accomplish it... Much of the bits are in place for doing so but namely the logic to determine when to display said core information (i.e. tracking identifiers and system changes to determine the intent of the benchmark is a CPU review/comparison) and then the more time consuming aspect is from the styling side to make the graphs look nice with the increased density of information. Unfortunately not sure when I'll have a lot of extra time on my hands to accomplish it though.
      Michael Larabel
      https://www.michaellarabel.com/

      Comment


      • #4
        What an impressive generational performance uplift of 40% ! It's not often consumers get to enjoy such goodies. AMD clearly has the best CPU engineers in the world, sadly they are far behind in GPU from the market leader.

        Comment


        • #5
          The results for me are not that impressive.

          When you go from a base clock of 2.4Ghz and a max boost clock of 3.7Ghz, 384MB L3 DDR5-4800 at 360 Watt TDP to 2.6GHz base clock, 4.5GHz max boost clock, the same 384MB L3 cache, 400 Watt default TDP and DDR5-6000 / DDR5-6400, you expect the new processor to be significantly faster.

          (4.5Ghz - 3.7Ghz) / 4.5Ghz * 100 = 18%

          (DDR5-6000 - DDR5-4800) / DDR5-6000 * 100 = 20%

          Total predicted performance increase: 38%

          This isn't some incredible engineering feat on the part of AMD, it's a great business decision on the part of AMD years ago to use TSMC's manufacturing.

          Lucky for AMD that years ago Intel's CEO was offered the opportunity to use TSMC and Pat Gelsinger nixed the idea.

          Also, all AMD CPUs get creamed by Intel's Xeons in some workloads, so depending on what a business does most the best choice is a toss up.

          The question is how long can AMD squeeze out gains by increasing clock speed of the CPU and faster ram, at some point the laws of physics smack you in the face.

          What I want to see if a greater use of FPGA's by Intel and for AMD to start using them as well:







          They both have them and AMD continues to buy up companies for AI and video acceleration but they don't seem to be doing anything with that IP.

          Comment


          • #6
            Originally posted by sophisticles View Post
            The results for me are not that impressive.

            When you go from a base clock of 2.4Ghz and a max boost clock of 3.7Ghz, 384MB L3 DDR5-4800 at 360 Watt TDP to 2.6GHz base clock, 4.5GHz max boost clock, the same 384MB L3 cache, 400 Watt default TDP and DDR5-6000 / DDR5-6400, you expect the new processor to be significantly faster.

            (4.5Ghz - 3.7Ghz) / 4.5Ghz * 100 = 18%

            (DDR5-6000 - DDR5-4800) / DDR5-6000 * 100 = 20%

            Total predicted performance increase: 38%

            This isn't some incredible engineering feat on the part of AMD, it's a great business decision on the part of AMD years ago to use TSMC's manufacturing.

            Lucky for AMD that years ago Intel's CEO was offered the opportunity to use TSMC and Pat Gelsinger nixed the idea.

            Also, all AMD CPUs get creamed by Intel's Xeons in some workloads, so depending on what a business does most the best choice is a toss up.

            The question is how long can AMD squeeze out gains by increasing clock speed of the CPU and faster ram, at some point the laws of physics smack you in the face.

            What I want to see if a greater use of FPGA's by Intel and for AMD to start using them as well:







            They both have them and AMD continues to buy up companies for AI and video acceleration but they don't seem to be doing anything with that IP.
            Pretty much intel wins mainly in cases of AMX, niche accelerators, memory limited thanks to 8800. For everything else, the vast majority of workloads, AMD is just faster and more efficient. AMD's avx512 implementation is competent and doesn't throttle nearly as much as intel's implementation. It's intel that traditionally tries to squeeze out more gains by pumping power to increase clocks. They're both on an efficiency track now to compete with more cores, I wouldn't be surprised if intel's main server line going forwards is E cores with the P cores being the niche for those expensive per-core-licensing workloads.

            Comment


            • #7
              Have you ever tried to run gaming benchmarks on these new EPYC/Xeon 6 CPUs?
              They're obviously not designed for it, but I'm curious how they would fare in a setup like what LTT did where he subdivided one big box into several separate gaming consoles.

              Epyc 9655 divided into 16 instances with 6 cores each, coupled with 8 4090s each divided into two would be equivalent of 16 gaming rigs each equipped with a 4070Ti. Could be an option for a business wanting to run a cloud gaming service, or even a LAN gaming place.



              Comment


              • #8
                Originally posted by sophisticles View Post
                (DDR5-6000 - DDR5-4800) / DDR5-6000 * 100 = 20%

                Total predicted performance increase: 38%
                That's not how things work. Doubling your RAM speed won't double performance.

                Comment


                • #9
                  Originally posted by smitty3268 View Post
                  That's not how things work. Doubling your RAM speed won't double performance.
                  Actually, many times, it does, if memory speed is a bottleneck.

                  Back in the Duron days I had one with 133Mhz SDR and I bought a new motherboard that supported 266Mhz DDR, performance doubled, mp3 encoding was twice as fast as was VCD and SVCD encoding, boot times were cut in half.

                  In this case, clearly these CPUs are bottlenecked by ram speed, because my calculated speed increase matches Michael's measured speed increase.

                  If you think about it, it makes sense, you have lots of big fat cores that need to be fed with data, and obviously they are not being fed fast enough.

                  AMD can probably keep increasing performance at a good clip by just increasing the ram speed that's supported.

                  Comment


                  • #10
                    Originally posted by sophisticles View Post

                    Actually, many times, it does, if memory speed is a bottleneck.

                    Back in the Duron days I had one with 133Mhz SDR and I bought a new motherboard that supported 266Mhz DDR, performance doubled, mp3 encoding was twice as fast as was VCD and SVCD encoding, boot times were cut in half.

                    In this case, clearly these CPUs are bottlenecked by ram speed, because my calculated speed increase matches Michael's measured speed increase.

                    If you think about it, it makes sense, you have lots of big fat cores that need to be fed with data, and obviously they are not being fed fast enough.

                    AMD can probably keep increasing performance at a good clip by just increasing the ram speed that's supported.
                    I really doubt mp3 encoding would benefit so much from memory bandwidth increase, since it is not a memory bound task but rather a heavy CPU bound task. If an increase in performance could be seen with more memory throughput, indeed it would not be totally depending to it.

                    Also dramatic increase in performance with higher memory bandwidth means that the workload data does not fit into CPU caches, so it is a matter of the "size" of the workload. And caches are definitely there to avoid stalling the CPU and let it wait for data coming from DRAM. Only when your CPU stalls costantly for data from DRAM, you get such dramatic performance increases.

                    What you are telling about is a useless simplification of the real world, because you don't consider the timings, the access patterns, the barriers and, in general, the whole memory hierarchy.

                    Comment

                    Working...
                    X