Announcement

Collapse
No announcement yet.

GCC & LLVM Clang Compiler Benchmarks On AMD's EPYC 7601

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC & LLVM Clang Compiler Benchmarks On AMD's EPYC 7601

    Phoronix: GCC & LLVM Clang Compiler Benchmarks On AMD's EPYC 7601

    For squeezing maximum performance out of Linux systems with source-based workloads, most of you know there can often be tweaks to be had to the compiler stack for greater performance. As well with the never-ending advancements to the leading open-source code compilers, between releases can be measurable performance benefits but sometimes not without regressions too. With AMD's EPYC line-up still being very fresh and the underlying Zen microarchitecture (or "znver1" as referred to by the compiler toolchains), here are a variety of benchmarks under recent releases of the GCC and LLVM Clang compilers.

    http://www.phoronix.com/vr.php?view=25265

  • #2
    Somebody really needs to set up global regression tracking for GCC, it seems they often don't notice some regressions on common open source cases until after release. It'd be nice to have something like the most popular couple hundred packages of a distro built daily, with some automated test for each.

    Comment


    • #3
      What's up with BLAKE? That's a 2x difference and doesn't vary within a compiler family. Either GCC is completly missing a trick or there's an ASM switch that didn't get set right for the GCC case.

      Comment


      • #4
        A question: Did x264 make use of all 32 cores?

        Also,

        Originally posted by phoronix View Post
        In other instances, optimizing for Jaguar (btver2) provides most of the benefit with the Zen optimizations (znver1) providing just a slight increase.
        Do you mean decrease? Compared to opteron-sse3 it's still a huge increase to me...

        Comment


        • #5
          Originally posted by microcode View Post
          Somebody really needs to set up global regression tracking for GCC, it seems they often don't notice some regressions on common open source cases until after release. It'd be nice to have something like the most popular couple hundred packages of a distro built daily, with some automated test for each.
          Well they usually should have such automated tests. I also think that it is really weird not to notice these regressions when you change code... And these are just simple benchmarks lol

          Comment


          • #6
            Originally posted by oooverclocker View Post

            Well they usually should have such automated tests. I also think that it is really weird not to notice these regressions when you change code... And these are just simple benchmarks lol

            There are a lot of simple benchmarks, though. But, yeah, they could do better to do performance regression testing. If performance isn't a concern, then what's the point of working on the compiler anymore? Bug fixes? New language support? I guess.

            Comment


            • #7
              Originally posted by tildearrow View Post
              A question: Did x264 make use of all 32 cores?

              Also,



              Do you mean decrease? Compared to opteron-sse3 it's still a huge increase to me...
              from what I read on other reviews x264 and specially x265 start showing bad scaling after 16+ threads but I'm not actually 100% sure since the scaling depends a lot of what options you use in the conversion

              Comment


              • #8
                Originally posted by willmore View Post
                What's up with BLAKE? That's a 2x difference and doesn't vary within a compiler family. Either GCC is completly missing a trick or there's an ASM switch that didn't get set right for the GCC case.
                I cannot tell you what's up with BLAKE, but I know from experience that the compilers can produce code, which just happens to be fast, but wasn't specifically intended to be this way. The compilers simply don't yet cover all the aspects of a CPU and thus cannot absolutely and perfectly optimize your code. It's then a matter of chance when a resulting code happens to be very fast or only suboptimal.

                To give an example... So can one core within a module of a CPU sometimes utilize a second unit, which it shares with another core of the same module. I.e. when two cores share two integer units can one core dispatch more integer operations per clock when the other core is idle. This makes it difficult to come up with an instruction scheduling algorithm for a compiler, because the instruction times are less deterministic and predictable.

                The trend to more complexity is also getting worse. So has AMD introduced neural networking to branch prediction and compiler developers then either have to implement the same mechanism in their compilers to predict its behaviour or, until it's been implemented, to pray and to hope it works in their favour.

                Comment


                • #9
                  People complain about h246 and h265 encoder scaling problems... why not just transcode 32 videos at once... I mean sure if you are editing one video at a time is a pretty common case, but if you are bulk transcoding you should transcode multiple videos at once rather than try to make something that tends to be single threaded split across 32 cores.

                  Comment


                  • #10
                    When I watch a llvm vs gcc match, I'm totally rooting for gcc without any rational reason. I guess it's just because GCC RULES! YEAH! I wanna blow a vuvuzela for gcc!

                    Comment

                    Working...
                    X