Announcement

Collapse
No announcement yet.

AMD Zen 5 Compiler Support Posted For GCC - Confirms New AVX Features & More

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Does anyone know what PREFETCHI does? Is that for prefetching instructions, as opposed to data?

    Also, I looked at a description of the MOVDIRI and I'm a little unclear on the use case for it, to the extent that it differs from that of non-temporal stores. If anyone has insight into what use cases it addresses, please share.

    Leave a comment:


  • AdrianBc
    replied
    Originally posted by Namelesswonder View Post
    VP2INTERSECT was flawed on Tiger Lake, which meant it was faster to emulate it than to actually use it.

    Although that could just be typical Intel underbaking their implementation and only until their second attempt or AMD's implementation does it work better.

    Also does AVX-VNNI even have any use over AVX512-VNNI? From Intel's own documentation it seems that they have the same CPI for their implementations, so AVX512-VNNI would have double the throughput.
    Is AMD just making a microcode implementation that uses the same AVX512-VNNI instructions to offer AVX-VNNI?

    Edit: It appears AVX-VNNI-INT* does add some more intrinsics over AVX512-VNNI, so it has use there, but AVX-VNNI standalone doesn't have any use over AVX512-VNNI.

    Intel has announced that in Granite Rapids VP2INTERSECT will be reintroduced.

    I assume that AMD has aimed in Zen 5 to provide compatibility with most of the Granite Rapids instruction set, with the exception of the features which would have been expensive to implement, i.e. AMX and AVX512-FP16, and for which it might be better to use a GPU anyway.


    AVX-VNNI has been introduced by Intel for their CPUs that do not support AVX-512.

    There are many software developers who did not bother to provide AVX-512 support in their programs, because so many Intel CPUs lack support, but who might have included AVX-VNNI support.

    It is normal for AMD to add AVX-VNNI support, so that their CPUs will not be handicapped when executing such Intel-oriented programs, especially because this addition is cheap, because it requires mainly changes in the instruction decoder, since they already had the corresponding execution units.



    Leave a comment:


  • AdrianBc
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.


    As others have said, one advantage of -march=znver4 over -march=x86-64-v4 is that the compiler does optimization tuned for the Zen 4 microarchitecture.

    The option -march=x86-64-v4 only enables the instructions supported by Skylake Server, 7 years ago.

    Zen 4 also supports many other instructions introduced by Cascade Lake, Cooper Lake, Cannon Lake and Ice Lake.

    These instructions increase the speed many times, i.e. not just by some percent, for various algorithms used in arithmetic with big numbers, cryptography and machine learning.

    Nevertheless, the greatest impact of these instructions is for various libraries, like OpenSSL, which usually include functions written in assembly language or with compiler intrinsics, which may be selected at run time.

    The impact on the code that does not use compiler intrinsics is likely to be less.

    In any case it does not make sense to ever compile with -march=x86-64-v4, unless you cannot predict on what kind of computers the code will be executed.

    Whenever you know that a program will be executed, e.g. on Zen 4 or on Alder Lake, the appropriate compiler option should be used.

    I am not a fan of "-march=native", because when you use an older compiler on a newer CPU it may fall back silently to the worst (but safe) options and also because I frequently compile a program on one computer for using it on another computer and it would be a waste of time for me to write an included makefile with appropriate make definitions for tool options that could be used only in special circumstances.


    Leave a comment:


  • S.Pam
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.
    Why not simply use -march=native if you are compiling yourself?

    Leave a comment:


  • ptr1337
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.


    Here is a diff of the flags which are enabled with -march=x86-64-v4 and -march=znver4

    Edit:
    This is done via:
    Code:
    gcc -march=x86-64-v4 -Q --help=target > v4.flags
    ​gcc -march=znver4 -Q --help=target > znver4.flags
    ​diff -u zen4.flags v4.flags
    Last edited by ptr1337; 10 February 2024, 04:53 PM.

    Leave a comment:


  • Mark Rose
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.
    If you read the gcc docs, you'll see that setting it to x86-64-v4 enables the use of instructions, but doesn't tune to a specific CPU. Tuning will take into account pipeline widths, instruction timings, and so on, to produce more optimal assembly for a specific CPU.

    Leave a comment:


  • Namelesswonder
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.
    I'm compiling my C++ app using GCC 4.3. Instead of manually selecting the optimization flags I'm using -march=native, which in theory should add all optimization flags applicable to the hardware I'm


    A litany of different methods to see what GCC does when using different microarchitecture options.

    Realistically there won't be a big difference when it comes to performance between using znver4 versus x86-64-v4, not noticeable in use but would be evident in benchmarks.
    With a specified microarchitecture the compiler will be able to choose more efficient values and limits instead of safe defaults. You also should be using native if you do not care to move or package software for a different machine. For me znver3 doesn't have Shadow Stack enabled, but native does, so you're going to have to check if your compiler enables any extra flags on top of znver4 when using native.

    Leave a comment:


  • coder
    replied
    Originally posted by [deXter] View Post
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.
    I think the only userspace instruction additions are AVX-512.


    However, Zen 4 includes more than the Skylake-level AVX-512 functionality found in v4.


    That said, I'm not sure how much benefit you'll get from those other extensions, aside from in software packages specifically optimized for them.

    Also, using -mtune=znver4 could enable instruction cost tables specific to zen 4, depending on whether AMD ever got around to contributing them. Note that -mtune is implied by -march (for x86, at least; I think it's not true for ARM).
    Last edited by coder; 11 February 2024, 02:50 AM.

    Leave a comment:


  • [deXter]
    replied
    Zen 4 user here. Does anyone know if there's a difference (instruction set wise and real-world impact) in compiling using march=x86-64-v4 vs march=znver4? I've only recently switched my (Arch, btw) packages to x86-64-v4, but now I wonder whether I should be using znver4 instead - I haven't come across any mentions of this on the interwebs.

    Leave a comment:


  • coder
    replied
    Originally posted by onlyLinuxLuvUBack View Post
    can somebody at intel just simplify things for buyers:
    call it MMX ( mental metal x-tensions )
    and add 2.0 , and then 2.1 and then 2.323
    It's funny that the way they're going with AVX10 is just to use a linear versioning scheme, like you mentioned. Well, version + execution width.

    Leave a comment:

Working...
X