Announcement

Collapse
No announcement yet.

GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On POWER9 With The Raptor Talos II

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On POWER9 With The Raptor Talos II

    Phoronix: GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On POWER9 With The Raptor Talos II

    Earlier this week I delivered the results of our largest-ever GCC vs. LLVM Clang Linux x86_64 compiler comparison with a dozen systems from various generations of Intel and AMD CPUs and using 62 benchmarks tested on GCC 8/9 and Clang 7/8 releases. In this article the compiler performance is being looked at for the IBM POWER9 architecture with the benchmarks done on a Raptor Computing Systems Talos II workstation running Ubuntu Linux.

    http://www.phoronix.com/vr.php?view=27522

  • #2
    That cachebench result is quite interesting. I'm betting that Clang is using faster builtins for mem* on Power.
    If you recompile cachebench without builtins that would probably even out the result.

    Comment


    • #3
      Originally posted by milkylainen View Post
      That cachebench result is quite interesting. I'm betting that Clang is using faster builtins for mem* on Power.
      If you recompile cachebench without builtins that would probably even out the result.
      I don't understand, does the use of faster builtins by Clang somehow cheat the benchmark? Or should this be a feature that GCC could implement too?

      Comment


      • #4
        Originally posted by Michael_S View Post

        I don't understand, does the use of faster builtins by Clang somehow cheat the benchmark? Or should this be a feature that GCC could implement too?
        No. Absolutely no cheating. Mem* functions are notoriously difficult to implement without making them greedy prefetchers. Since cachebench is mostly c library mem* and various simple closed loop array walks, compilers should matter less, not more.

        Either way, cachebench tests are rather trivial. Tests that stick out like a sore thumb in cachebench between compilers do warrant closer investigation.
        It should be pretty easy to understand why either one is faster compared to more complex source.

        Comment


        • #5
          Michael, a typo:
          Originally posted by phoronix View Post
          "-O3 -mtune-native mcpu=native"
          Maybe "-mcpu=native" with dash?

          Comment

          Working...
          X