Announcement

Collapse
No announcement yet.

Zen 3 GCC Tuning Continues With More Correct Latencies Rather Than "Random Numbers"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zen 3 GCC Tuning Continues With More Correct Latencies Rather Than "Random Numbers"

    Phoronix: Zen 3 GCC Tuning Continues With More Correct Latencies Rather Than "Random Numbers"

    On Monday, the AMD EPYC 7003 "Milan" launch day, we finally got to see some serious tuning begin for the Zen 3 "Znver3" CPU target in the GCC compiler after that initial code landed at the end of last year. Yesterday a second Zen 3 tuning patch was published and then today a third tuning patch has made it out...

    https://www.phoronix.com/scan.php?pa...Znver3-Round-3

  • #2
    sdack do you remember how in october you were talking about not worrying about software availability of hardware that's not even release yet?

    Comment


    • #3
      It looks like the throughput/latency numbers for IDIV on Zen2 are accurate and not "random numbers". See Agner's instruction tables on page 103:

      https://www.agner.org/optimize/instruction_tables.pdf

      I am surprised that Zen3 improves IDIV performance that much, usually CPU makers don't spend much effort in making integer division fast.

      Comment


      • #4
        Originally posted by bluescarni View Post
        It looks like the throughput/latency numbers for IDIV on Zen2 are accurate and not "random numbers". See Agner's instruction tables on page 103:

        https://www.agner.org/optimize/instruction_tables.pdf

        I am surprised that Zen3 improves IDIV performance that much, usually CPU makers don't spend much effort in making integer division fast.
        Zen 2 was significantly slower than Intel's chips with IDIV before. I haven't checked the exact latencies that Intel provides, but the size of this change makes me think they're probably roughly on par now. I'm guessing there are one or two benchmarks out there this helps enough with that they wanted to make sure it wasn't an easy way for Intel to stay ahead of them.

        Comment


        • #5
          Originally posted by smitty3268 View Post

          Zen 2 was significantly slower than Intel's chips with IDIV before. I haven't checked the exact latencies that Intel provides, but the size of this change makes me think they're probably roughly on par now. I'm guessing there are one or two benchmarks out there this helps enough with that they wanted to make sure it wasn't an easy way for Intel to stay ahead of them.
          That's interesting, Agner's tables for Skylake indeed indicate higher performance wrt Zen 2 for IDIV. Do you happen to know if Zen 3 improved also the pdep and pext instructions?

          Comment


          • #6
            Originally posted by bluescarni View Post

            That's interesting, Agner's tables for Skylake indeed indicate higher performance wrt Zen 2 for IDIV. Do you happen to know if Zen 3 improved also the pdep and pext instructions?
            Yes, they're way faster. My understanding is that Zen 2 just provided these instructions through microcode for compatibility, while Zen 3 supports them natively.

            Per Anandtech:
            In fact there are a significant number of differences throughout the core. AMD has improved:
            • branch prediction bandwidth
            • faster switching from the decode pipes to the micro-op cache,
            • faster recoveries from mispredicts,
            • enhanced decode skip detection for some NOPs/zeroing idioms
            • larger buffers and execution windows up and down the core,
            • dedicated branch pipes,
            • better balancing of logic and address generation,
            • wider INT/FP dispatch,
            • higher load bandwidth,
            • higher store bandwidth,
            • better flexibility in load/store ops
            • faster FMACs
            • A wide variety of faster operations (including x87?)
            • more TLB table walkers
            • better prediction of store-to-load forward dependencies
            • faster copy of short strings
            • more AVX2 support (VAES, VPCLMULQD)
            • substantially faster DIV/IDIV support
            • hardware acceleration of PDEP/PEXT
            https://www.anandtech.com/show/16214...5700x-tested/6
            Zen2/3:
            PDEP/PEXT Parallel Bits
            Deposit/Extreact
            300 cycle latency
            250 cycles per 1
            3 cycle latency
            1 per clock
            Last edited by smitty3268; 20 March 2021, 05:52 PM.

            Comment


            • #7
              Originally posted by smitty3268 View Post

              Yes, they're way faster. My understanding is that Zen 2 just provided these instructions through microcode for compatibility, while Zen 3 supports them natively.

              Per Anandtech:


              https://www.anandtech.com/show/16214...5700x-tested/6
              Zen2/3:
              PDEP/PEXT Parallel Bits
              Deposit/Extreact
              300 cycle latency
              250 cycles per 1
              3 cycle latency
              1 per clock
              Cheers, thanks for digging the numbers out. I didn't know anandtech had such in-depth analysis.

              Comment

              Working...
              X