Announcement

Collapse
No announcement yet.

AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

    Phoronix: AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

    Making for a very exciting Saturday morning, AMD just posted their initial enablement patch for plumbing Zen 5 processor support "znver5" into the GNU Compiler Collection! With GCC 14 due to be released as stable in March~April as usual for the annual compiler release, it's been frustrating to see no Zen 5 support even while Intel has already been working on Clear Water Forest and Panther Lake support with already having upstreamed Sierra Forest, Granite Rapids, and other new CPU targets months ago... Well, Granite Rapids was added to GCC in late 2022. But squeezing in as what should now be merged in time is the initial AMD Zen 5 support!..

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Possibly worth noting that AMD is actually doing something Intel isn't twice here. The obvious one is PREFETCHI which is a "not yet" situation for Intel; as Michael noted, it won't show up for them until Granite Rapids. The more interesting one is AVX512-VP2INTERSECT, which was only found in Tiger Lake and Intel has halfway deprecated at this point; if AMD manages to revive it that'll put Intel in a very…intriguing position.

    Comment


    • #3
      Over Zen 4, this confirms AMD Zen 5 as adding AVXVNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.
      AVXVNNI. What about AVXVDDI and AVXVCCI?

      Comment


      • #4
        VP2INTERSECT was flawed on Tiger Lake, which meant it was faster to emulate it than to actually use it.

        Although that could just be typical Intel underbaking their implementation and only until their second attempt or AMD's implementation does it work better.

        Also does AVX-VNNI even have any use over AVX512-VNNI? From Intel's own documentation it seems that they have the same CPI for their implementations, so AVX512-VNNI would have double the throughput.
        Is AMD just making a microcode implementation that uses the same AVX512-VNNI instructions to offer AVX-VNNI?

        Edit: It appears AVX-VNNI-INT* does add some more intrinsics over AVX512-VNNI, so it has use there, but AVX-VNNI standalone doesn't have any use over AVX512-VNNI.
        Last edited by Namelesswonder; 10 February 2024, 12:58 PM.

        Comment


        • #5
          i hope they keep lower power consumption compared to intel's cpus when it comes to AVX

          Comment


          • #6
            Originally posted by Namelesswonder View Post
            Also does AVX-VNNI even have any use over AVX512-VNNI?
            Probably the main reason they added it is to run optimized codepaths targeted at Intel hybrid CPUs, rather than a slower fallback path.

            Comment


            • #7
              Originally posted by loganj View Post
              i hope they keep lower power consumption compared to intel's cpus when it comes to AVX
              Agreed. I haven't heard any rumors about this, but the half-width implementation used in Zen 4 has been a real winner. I'll bet if they'd just add more FMA-capable execution ports, the performance gap between its AVX-512 and Golden Cove's would narrow enough to make it largely irrelevant.

              I think it's interesting that ARM switched from 2x 256-bit SVE ports to 4x 128-bit SVE2 ports, between the Neoverse V1 and V2 cores. Perhaps it shows that execution width is no longer as important as once thought? Or, maybe it just has more to do with the amount of ARM code that still relies primarily on 128-bit NEON SIMD. Even in that case, AMD's 256-bit implementation is very amenable to AVX2, of which there's still a lot out there (and especially now that Intel removed AVX-512 from their client processors, with AVX10/256 looking set to come next).
              Last edited by coder; 10 February 2024, 02:02 PM.

              Comment


              • #8
                Originally posted by Namelesswonder View Post
                VP2INTERSECT was flawed on Tiger Lake, which meant it was faster to emulate it than to actually use it.

                Although that could just be typical Intel underbaking their implementation and only until their second attempt or AMD's implementation does it work better.

                Also does AVX-VNNI even have any use over AVX512-VNNI? From Intel's own documentation it seems that they have the same CPI for their implementations, so AVX512-VNNI would have double the throughput.
                Is AMD just making a microcode implementation that uses the same AVX512-VNNI instructions to offer AVX-VNNI?

                Edit: It appears AVX-VNNI-INT* does add some more intrinsics over AVX512-VNNI, so it has use there, but AVX-VNNI standalone doesn't have any use over AVX512-VNNI.
                can somebody at intel just simplify things for buyers:
                call it MMX ( mental metal x-tensions )
                and add 2.0 , and then 2.1 and then 2.323

                Comment


                • #9
                  Originally posted by onlyLinuxLuvUBack View Post
                  can somebody at intel just simplify things for buyers:
                  call it MMX ( mental metal x-tensions )
                  and add 2.0 , and then 2.1 and then 2.323
                  It's funny that the way they're going with AVX10 is just to use a linear versioning scheme, like you mentioned. Well, version + execution width.

                  Comment


                  • #10
                    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.

                    Comment

                    Working...
                    X