The Compelling AVX-512 Performance Advantage On AMD EPYC 9005 "Turin"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • phoronix
    Administrator
    • Jan 2007
    • 67370

    The Compelling AVX-512 Performance Advantage On AMD EPYC 9005 "Turin"

    Phoronix: The Compelling AVX-512 Performance Advantage On AMD EPYC 9005 "Turin"

    Back in October following the launch of the EPYC 9005 "Turin" processors I ran an AVX-512 performance comparison for the EPYC 9755 with 512-bit data path vs. 256-bit data path vs. AVX-512 disabled. That was interesting for showing the benefits of Zen 5's full 512-bit data path support compared to the "double pumped" approach with Zen 4 or optionally used via a BIOS option on Zen 5. AVX-512 continues to prove to be very performant and power efficient with AMD Zen 5 processors unlike with the early generations of AVX-512 on Intel processors. Here is a fresh look at the AVX-512 performance on a Supermicro server with an EPYC 9655 processor.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
  • Espionage724
    Senior Member
    • Sep 2024
    • 381

    #2
    One-off from NVIDIA's Turing

    Comment

    • schmidtbag
      Senior Member
      • Dec 2010
      • 6618

      #3
      Pretty impressive results. Practically free performance.

      Comment

      • sophisticles
        Senior Member
        • Dec 2015
        • 2594

        #4
        Very "shocked" to see that enabling AVX-512 results in higher performance.

        Equally "shocking" is that with AVX-512 the CPU clocks lower while without AVX-512 it clocks higher and that there is a corresponding higher power consumption with the higher clock speed.

        "Never" would have predicted these results.

        Comment

        • numacross
          Senior Member
          • Jun 2017
          • 761

          #5
          Originally posted by sophisticles View Post
          Equally "shocking" is that with AVX-512 the CPU clocks lower while without AVX-512 it clocks higher and that there is a corresponding higher power consumption with the higher clock speed.
          You should look again because the second part of your statement isn't true. On average AVX-512 enabled tests took more power despite lower clocks.

          Comment

          • markg85
            Senior Member
            • Oct 2007
            • 511

            #6
            Im a little - a lot - surprised by the raw performance numbers of pytorch and tensorflow where both are doing resnet50 with the same batch size. Is tensor flow really ~5x faster then pytorch? Or am i missing something? And if it really is faster, can that be of importance to LLM inferencing on the cpu? In other terms, could LLM inferencing be 5x as fast when tensorflow is used instead??

            Comment

            • Sweepi
              Junior Member
              • Nov 2021
              • 4

              #7
              Since its the current hype wave: Are you considering benchmarking DeepSeek V3/R1 70B/630B with Llama.cpp on CPUs?
              Should be possible with 2x Epic 9xx4/9xx5 (2x for more RAM Bandwidth) and 768 GB(+) of RAM.

              Comment

              Working...
              X