Announcement

Collapse
No announcement yet.

AMD Zen 3 Performance With The Initial "znver3" GCC Compiler Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Zen 3 Performance With The Initial "znver3" GCC Compiler Support

    Phoronix: AMD Zen 3 Performance With The Initial "znver3" GCC Compiler Support

    Last week AMD published their Zen 3 support for GCC code compiler. That initial support, which has already been merged into GCC 11, is the initial support flipping on newly supported instructions but not yet offering any tuned scheduler model or other optimizations compared to the existing Zen 2 path. In any case, here is a look at the performance changes with building the open-source benchmarks under test with "znver3" compared to the prior Zen 2 and Zen 1 targets along with generic x86_64 and then also looking at the performance if catering the compiler targets for Intel's Skylake and Haswell processors.

    http://www.phoronix.com/vr.php?view=29761

  • #2
    One of the places were AMD was lagging behind Intel, was exactly in optimizations for the several compilers..
    Once those arrive and bring good support, AMD puts itself in another stage, one of a kind, since their processors are really good!
    One can see by the simple CoreMark test.. there was a uplift in performance..

    Comment


    • #3
      Would be curious to know what -march=native yielded

      Grepping GCC on my Skylake system:

      gcc -march=native -O3 -E -v - </dev/null 2>&1 | grep cc1

      /usr/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/cc1 -E -quiet -v - -march=skylake -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi
      -msgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-
      prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx
      512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite -mno-avx512bf16 -mno-enqcmd -mno-avx512vp2intersect --param l1-cache-size=32 -
      -param l1-cache-line-size=64 --param l2-cache-size=8192
      -mtune=skylake -O3

      gcc -march=skylake -O3 -E -v - </dev/null 2>&1 | grep cc1

      /usr/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/cc1 -E -quiet -v - -march=skylake -O3

      Not sure if any of that'll make a difference but I expect the cache stuff might

      Comment


      • #4
        Nothing to see here

        Comment


        • #5
          Hopefully they will make some meaningful changes in the future.

          Comment


          • #6
            When the support is there, it would be nice to also start seeing x86-64v1 to x86-64v4 for the new generic CPU feature levels.

            Comment


            • #7
              So are any software developers creating binaries for distribution going to realistically target Zen 3 when compiling, or are they more likely to use x86-64?

              In this instance, you were able to run code compiled with a Skylake target on a Zen 3 CPU, but would that always be the case? Or are there situations where specifying specific architectures like that will result in incompatibility with compiled binaries?

              Comment


              • #8
                Originally posted by AmericanLocomotive View Post
                So are any software developers creating binaries for distribution going to realistically target Zen 3 when compiling, or are they more likely to use x86-64?
                I cannot speak for all developers since you're asking "are any", but the majority of distributions are using x86-64 as the target. So does Debian distinguish between amd64 and i386 for their PC distributions and the default for the amd64 distribution is x86-64 as this is what GCC is set to default to. You can check on the default for your own distribution with: /usr/bin/gcc --help=target -Q|grep march

                In this instance, you were able to run code compiled with a Skylake target on a Zen 3 CPU, but would that always be the case? Or are there situations where specifying specific architectures like that will result in incompatibility with compiled binaries?
                Yes, there can be differences and these can cause binaries to fail. However GCC offers additional command line options specifically to address this problem. So does one specify the instruction set with the -march= switch, but can then change the tuning to match that of another CPU with the -mtune= switch. For example -march=x86-64 -mtune=znver3 will instruct GCC to use only instructions common to all x86 64-bit CPUs, but then assume a Zen 3 CPU for the exact timings and scheduling information and optimise for a Zen 3 CPU, but without using all of its instructions.

                For this reason should you also watch the development of the new feature levels that allow to go beyond what the generic x86-64 instruction set offers, but without being too specific to a single CPU type, and to allow for a range of newer CPUs and instruction sets.

                In any case, if you happen to come across a piece of software where performance is critical then you best configure and build the software yourself, because distributions cannot cover all possible use cases. You might find that disabling or enabling some features of the software at the configure stage will give you a greater gain than a compiler switch. And when possible do you want to compile the software with -march=native directly on and for the CPU you need it. GCC will then auto-detect the CPU at compile-time and turn on all supported features for it.

                Comment


                • #9
                  Interesting. So it looks like a lot of software packages are leaving quite a bit of performance on the table by only targeting x86-64. Looks like anywhere from 5-20% for highly optimized code vs. base x86-64

                  It makes me wonder how much of the Apple M1's performance advantage is due to the compiler being highly optimized, and most x86 codes targeting architectures from ~2005.

                  Comment


                  • #10
                    Originally posted by AmericanLocomotive View Post
                    Interesting. So it looks like a lot of software packages are leaving quite a bit of performance on the table by only targeting x86-64. Looks like anywhere from 5-20% for highly optimized code vs. base x86-64

                    It makes me wonder how much of the Apple M1's performance advantage is due to the compiler being highly optimized, and most x86 codes targeting architectures from ~2005.
                    You can always use a source-based distribution =)

                    I think code can also be compiled with multiple targets so that they detect at run-time what code-path to use.

                    Comment

                    Working...
                    X