Announcement

Collapse
No announcement yet.

AMD AOCC 2.3 Squeezing Out Extra Performance For EPYC Over GCC 10, Clang 11

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD AOCC 2.3 Squeezing Out Extra Performance For EPYC Over GCC 10, Clang 11

    Phoronix: AMD AOCC 2.3 Squeezing Out Extra Performance For EPYC Over GCC 10, Clang 11

    At the start of the month AMD released AOCC 2.3 as the newest version of the AMD Optimizing C/C++ Compiler. AOCC is one of several LLVM/Clang downstream versions maintained by the company with this one being about delivering flagship AMD Zen family compiler support. From an AMD EPYC 7002 "Rome" series processor I recently wrapped up fresh benchmarks of AOCC 2.3 against the current GCC 10 and Clang 11 compiler releases.

    http://www.phoronix.com/vr.php?view=29780

  • #2
    Nice.

    Hopefully more patches will be up streamed soon and regressions addressed. It looks like issues in OpenMP mostly.

    Comment


    • #3
      Honestly, that's a pretty significant performance increase. Way more than I thought. I think we are really starting to get to the point where we are going to need to start compiling binary code for more modern architectures - even if it means leaving all those Athlon 64s, Core 2 Duos and Bulldozers behind.

      Michael, I would have loved if you could have included GCC running generic march=x86-64 as a comparison as well, just as a reference.

      Comment


      • #4
        Also wondering:

        1. If the resulting binaries run on Intel processors as well

        2. If possible, is there any performance delta on Intel processors?

        Comment


        • #5
          Originally posted by vladpetric View Post
          Also wondering:

          1. If the resulting binaries run on Intel processors as well

          2. If possible, is there any performance delta on Intel processors?
          Depends what options you will pass exactly to the compiler, but in general if you even compile it targeting znver2, it will still run on many Intel CPUs, as long as they have all the needed extensions. There might be few extensions that AMD has but Intel doesn't, but 1) maybe one or two, 2) rarely autogenerated by compiler, 3) easy to check, probably zero.

          There is also concept of tuning, at least in gcc. Not sure about llvm. In gcc, you can say use instruction set(s) X, but tune for Y (even if Y has more extensions). It is pretty common to say something like `-mcpu=sandybridge -mtune=skylake`, which will use core "X86_64", AVX and few other things, but not many super modern stuff, but tune the code not for sandybridge, but for more recent (and popular) CPU, thus giving better performance on average to average person with average CPU. In fact `-mcpu=generic` automatically does such things for you. Generic is changed yearly or so, to tune for predominant market share CPU models, while still running on decade old CPUs.

          So, the delta would be highly depended on options.

          The comprehensive investigation of many combinations, would take some time. From my experience it is not very big change in most cases (few % max), but in some workloads it is bigger.

          Comment


          • #6
            Originally posted by vladpetric View Post
            Also wondering:

            1. If the resulting binaries run on Intel processors as well

            2. If possible, is there any performance delta on Intel processors?
            Sorry, but it will definitely not run on "Intel". Not in general and also not on older AMD CPUs like the FX series. There could be exceptions, but since you've asking about Intel and not a specific Intel CPU is the answer no.
            Last edited by sdack; 17 December 2020, 03:45 PM.

            Comment


            • #7
              Maybe this will help persuade more peeps into using Gentoo.

              Comment


              • #8
                Originally posted by sdack View Post
                Sorry, but it will definitely not run on "Intel". Not in general and also not on older AMD CPUs like the FX series. There could be exceptions, but since you've asking about Intel and not a specific Intel CPU is the answer no.
                Which AMD-specific instructions will cause that? Typically AMD is a bit behind Intel with respect to supporting various instruction sets (and that's not a criticism). There are definitely system instructions (mostly running only in privileged mode) where they are different, but the question is which instructions that a compiler like gcc will generate will cause an incompatibility.

                I could have been more specific - Intel Core 10th/11th gen.

                Comment


                • #9
                  Originally posted by baryluk View Post

                  Depends what options you will pass exactly to the compiler, but in general if you even compile it targeting znver2, it will still run on many Intel CPUs, as long as they have all the needed extensions. There might be few extensions that AMD has but Intel doesn't, but 1) maybe one or two, 2) rarely autogenerated by compiler, 3) easy to check, probably zero.

                  There is also concept of tuning, at least in gcc. Not sure about llvm. In gcc, you can say use instruction set(s) X, but tune for Y (even if Y has more extensions). It is pretty common to say something like `-mcpu=sandybridge -mtune=skylake`, which will use core "X86_64", AVX and few other things, but not many super modern stuff, but tune the code not for sandybridge, but for more recent (and popular) CPU, thus giving better performance on average to average person with average CPU. In fact `-mcpu=generic` automatically does such things for you. Generic is changed yearly or so, to tune for predominant market share CPU models, while still running on decade old CPUs.

                  So, the delta would be highly depended on options.

                  The comprehensive investigation of many combinations, would take some time. From my experience it is not very big change in most cases (few % max), but in some workloads it is bigger.
                  I agree with what you're saying and your expectation for the outcome; but I would still love some numbers, that's all.

                  And yes, good ole' out-of-order execution

                  Comment


                  • #10
                    Originally posted by vladpetric View Post
                    I could have been more specific - Intel Core 10th/11th gen.
                    The problem is complex and makes giving an accurate answer difficult. Not only does it depend on the CPU and its supported instructions, but on the compiler as well and how many of the instructions it can use, and it also depends on the program itself, which may not need all available instructions. Your best chance at finding it out is to test it.

                    Comment

                    Working...
                    X