LLVM Clang 16 vs. GCC 13 Compiler Performance On AMD 4th Gen EPYC "Genoa"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hubicka
    replied
    Main most of geomean difference is due to the two oneDNN benchmarks. I perfed it on zen4 and it is a jitter based benchmark. Most time is spent by jit produced code and omp runtime:

    5.15% benchdnn libomp.so [.] 0000000000058a
    3.38% benchdnn libomp.so [.] 0x00000000000c▒
    2.69% benchdnn libm.so.6 [.] __ieee754_logl▒
    1.16% benchdnn benchdnn [.] rnn::fill_memo▒
    0.90% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.78% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.41% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.38% benchdnn libomp.so [.] 0x00000000000c▒
    0.27% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.24% benchdnn libomp.so [.] 0x00000000000c▒
    0.24% benchdnn libdnnl.so.3.1 [.] dnnl_memory_de▒
    0.23% benchdnn libm.so.6 [.] __logl ▒
    0.23% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.19% benchdnn libomp.so [.] 0x00000000000c▒
    0.16% benchdnn benchdnn [.] rnn::fill_weig▒
    0.16% benchdnn libomp.so [.] 0x000000000008▒
    0.14% benchdnn libomp.so [.] 0x000000000008▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.11% benchdnn libomp.so [.] 0x000000000005▒
    0.11% benchdnn libomp.so [.] 0x00000000000c▒
    0.10% benchdnn libomp.so [.] 0x000000000005▒
    0.10% benchdnn libomp.so [.] 0x00000000000c▒
    0.09% benchdnn libomp.so [.] 0x00000000000c▒
    0.09% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.06% benchdnn libomp.so [.] 0x00000000000c▒
    0.05% benchdnn libomp.so [.] 0x00000000000c▒
    0.05% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.05% benchdnn benchdnn [.] dnn_mem_t::set▒
    0.05% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.05% benchdnn libomp.so [.] 0x00000000000c▒
    0.04% benchdnn libomp.so [.] 0x00000000000c▒
    0.03% benchdnn libomp.so [.] 0x00000000000c▒
    0.03% benchdnn libm.so.6 [.] __logf_fma ▒
    0.03% benchdnn libomp.so [.] 0x000000000005▒
    0.03% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.03% benchdnn benchdnn [.] round_to_neare▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libc.so.6 [.] __sched_yield ▒
    0.02% benchdnn libc.so.6 [.] __memset_avx2_▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn [unknown] [k] 0xffffffffb244▒
    0.02% benchdnn [unknown] [k] 0xffffffffb257▒
    0.01% benchdnn libdnnl.so.3.1 [.] std::_Function▒



    So it is not really testing quality of code generation in compiler.

    Leave a comment:


  • hubicka
    replied
    Main most of geomean difference is due to the two oneDNN benchmarks
    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


    I perfed it on zen4 and it is a jitter based benchmark. Most time is spent by jit produced code and omp runtime

    Code:
    [FONT=Courier New][COLOR=#000000]5.15% benchdnn libomp.so [.] 0000000000058a◆ [/COLOR]
    [COLOR=#18b218]3.38%[/COLOR][COLOR=#000000] benchdnn libomp.so [.] 0x00000000000c▒ [/COLOR]
    [COLOR=#18b218]2.69%[/COLOR][COLOR=#000000] benchdnn libm.so.6 [.] __ieee754_logl▒ [/COLOR]
    [COLOR=#18b218]1.16%[/COLOR][COLOR=#000000] benchdnn benchdnn [.] rnn::fill_memo▒ [/COLOR]
    [COLOR=#18b218]0.90%[/COLOR][COLOR=#000000] benchdnn libdnnl.so.3.1 [.] std::_Function▒ [/COLOR]
    [COLOR=#18b218]0.78%[/COLOR][COLOR=#000000] benchdnn libdnnl.so.3.1 [.] std::_Function▒ [/COLOR]
    0.41% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.38% benchdnn libomp.so [.] 0x00000000000c▒
    0.27% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.24% benchdnn libomp.so [.] 0x00000000000c▒
    0.24% benchdnn libdnnl.so.3.1 [.] dnnl_memory_de▒
    0.23% benchdnn libm.so.6 [.] __logl ▒
    0.23% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.19% benchdnn libomp.so [.] 0x00000000000c▒
    0.16% benchdnn benchdnn [.] rnn::fill_weig▒
    0.16% benchdnn libomp.so [.] 0x000000000008▒
    0.14% benchdnn libomp.so [.] 0x000000000008▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.12% benchdnn libomp.so [.] 0x00000000000c▒
    0.11% benchdnn libomp.so [.] 0x000000000005▒
    0.11% benchdnn libomp.so [.] 0x00000000000c▒
    0.10% benchdnn libomp.so [.] 0x000000000005▒
    0.10% benchdnn libomp.so [.] 0x00000000000c▒
    0.09% benchdnn libomp.so [.] 0x00000000000c▒
    0.09% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libomp.so [.] 0x00000000000c▒
    0.07% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.06% benchdnn libomp.so [.] 0x00000000000c▒
    0.05% benchdnn libomp.so [.] 0x00000000000c▒
    0.05% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.05% benchdnn benchdnn [.] dnn_mem_t::set▒
    0.05% benchdnn libdnnl.so.3.1 [.] dnnl::impl::cp▒
    0.05% benchdnn libomp.so [.] 0x00000000000c▒
    0.04% benchdnn libomp.so [.] 0x00000000000c▒
    0.03% benchdnn libomp.so [.] 0x00000000000c▒
    0.03% benchdnn libm.so.6 [.] __logf_fma ▒
    0.03% benchdnn libomp.so [.] 0x000000000005▒
    0.03% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.03% benchdnn benchdnn [.] round_to_neare▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libc.so.6 [.] __sched_yield ▒
    0.02% benchdnn libc.so.6 [.] __memset_avx2_▒
    0.02% benchdnn libdnnl.so.3.1 [.] std::_Function▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn libomp.so [.] 0x00000000000c▒
    0.02% benchdnn [unknown] [k] 0xffffffffb244▒
    0.02% benchdnn [unknown] [k] 0xffffffffb257▒
    0.01% benchdnn libdnnl.so.3.1 [.] std::_Function▒[/FONT]

    So it is not really testing quality of code generation in compiler.

    Leave a comment:


  • skeevy420
    replied
    Originally posted by rene View Post
    So cool this Linux distribution now has a clang by default as sys-cc and clang lot the linux kernel on other supported architectures, ... https://www.youtube.com/watch?v=nLyUhEMwGws !
    Record BREAKING #RISC #CISC #Linux #t2sde #release #t2sde #Ad: laptops & more @Amazon: https://services.exactcode.de/amzn.cg... You can support my work at: https://patreon.com/renerebe https://github.com/sponsors/rxrbln/ http://onlyfans.com/renerebe https://exactcode.com https://t2sde.org https://rene.rebe.de

    Hmm, one of those is not like the rest
    Last edited by skeevy420; 31 May 2023, 12:30 PM.

    Leave a comment:


  • oleid
    replied
    Originally posted by carewolf View Post

    But clang isn't faster anymore. Hasn't been for years.
    It is in our codebase. Which is mostly C++. AFAIR it is also the case for phoronix benchmarks. It is, however, slower for plain C code.

    Leave a comment:


  • carewolf
    replied
    Originally posted by oleid View Post
    So in the end it's a "benchmark your code to see what works best". At work I use clang for development and gcc for deployment. Mostly due to clang's faster compile time of c++ code. Diagnostics are mostly on par nowadays.
    But clang isn't faster anymore. Hasn't been for years. Diagnostics is a mixed bag. Clang has a tendency to accept waay more illegal C++ than gcc, I think because they need to simulate both gcc and msvc, but it accepts anything, often without a warning, but its warnings are often better when they are issued.
    Last edited by carewolf; 31 May 2023, 03:37 AM.

    Leave a comment:


  • Lycanthropist
    replied
    I wonder how clang compares to a current MSVC compiler.

    Leave a comment:


  • Healer_LFG
    replied
    Originally posted by filbo View Post
    I'd also be quite curious to see the entire set of tests also run on both compilers with '-march=blended', or whatever is used these days to generate broadly operable output code
    -march=x86-64
    -march=x86-64-v2
    -march=x86-64-v3
    -march=x86-64-v4

    All these are generic options for x86_64 processors. Each one supports a different minimum-requirement of certain instruction sets, but are backwards compatible. I.E, a CPU that supports -march=x86-64-v4 will compile just fine with -march=x86-64, albeit without the beneifits that newer instruction sets will bring.

    Leave a comment:


  • DanglingPointer
    replied
    These tests need to be run against the versions underneath to compare whether it is worthwhile to invest on DevOps and CICD to change:
    • gcc-11.4
    • gcc-12.3
    • clang-14
    • clang-15
    All of which are still clearly in much wider use than gcc-13 and clang-16. This would help adoption to newer versions if we know it actually is worthwhile to invest in the DevOps processes to switch to the new compilers sooner rather than organically later after much time has passed.

    Leave a comment:


  • filbo
    replied
    The 'Number Of First Place Finishes' chart uses two nicely offsetting colors. Please use those same colors for all the individual bar charts so it is more readily apparent which is which. (I do see that in this article, the two contenders always appear in the same order, but you don't always do it that way; and color hinting is an improvement in any case.)

    Since this was nominally a test of '-march=znver4', it would be nice to see that explicitly tested vs. '-march=native'. It would also be sufficient to say in the text that you ran a few representative tests on each compiler and confirmed that the two are equivalent, as intended.

    I'd also be quite curious to see the entire set of tests also run on both compilers with '-march=blended', or whatever is used these days to generate broadly operable output code.

    Finally, in the final geometric mean chart, it would be very interesting to see a 'fastest' contender. This would be a fake run where each individual benchmark got the fastest score of any of the individual real contenders. 'Fastest' would be the 100% contender, reducing each of the other competitors to a somewhat smaller number showing what fraction of the overall fastest possible score it can deliver. This seems particularly interesting in cases like this, where the final geomean seems to show that they're quite close together, yet there are many individual tests where one or the other rockets ahead. Seeing real contenders with, say, 96% scores would tell you it isn't that important to pick and choose for each binary; real scores in the 80% range might indicate that you really *should* think about mix and match.

    Leave a comment:


  • rene
    replied
    So cool this Linux distribution now has a clang by default as sys-cc and clang lot the linux kernel on other supported architectures, ... https://www.youtube.com/watch?v=nLyUhEMwGws !

    Leave a comment:

Working...
X