Announcement

Collapse
No announcement yet.

GCC vs. LLVM/Clang On The AMD Richland APU

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC vs. LLVM/Clang On The AMD Richland APU

    Phoronix: GCC vs. LLVM/Clang On The AMD Richland APU

    Along with benchmarking the AMD A10-6800K "Richland" APU on Linux and its Radeon HD 8670D graphics, I provided some GCC compiler tuning benchmarks for this AMD APU with Piledriver cores. The latest Linux testing from the A10-6800K is a comparison of GCC 4.8.1 to LLVM/Clang 3.3 on this latest-generation AMD low-power system.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.

    Comment


    • #3
      The gap is only going to widen with LLVM/Clang 3.4 pushing ahead in large areas of performance and scalability.

      Comment


      • #4
        And there I thought, the language had gotten better?

        ?Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.?

        ugh? you couldn?t have found a better way to say that those tests are completely useless?

        Comment


        • #5
          Originally posted by MWisBest View Post
          I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.
          Please have a look at

          * Timed MAFFT alignment,
          * BLAKE2,
          * Botan MAC,
          * Himeno and
          * C-Ray (please call this one “not by much” again…)

          LLVM seems to be very fast at matrix multiplication, though.

          So my summary of the results would be:

          * LLVM is great at Successive Jacobi Relaxation.
          * GCC is great at C-Ray.
          * LLVM has no OpenMP support, so don’t even try to use it for scientific code, except if you want to go all the way and use explicit MPI (which makes the SciMark test somewhat less useful).

          Comment


          • #6
            Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

            If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

            As such the Botan benchmarks are pointless in this context.

            This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.

            Comment


            • #7
              Originally posted by ArneBab View Post
              “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.”

              ugh… you couldn’t have found a better way to say that those tests are completely useless?
              "Those tests are useless"

              now switch perspective to someone who needs OpenMP

              "That compiler is useless"

              Funny, isn't it.

              Comment


              • #8
                Originally posted by XorEaxEax View Post
                Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

                If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

                As such the Botan benchmarks are pointless in this context.

                This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.
                -O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.

                Comment


                • #9
                  Originally posted by carewolf View Post
                  -O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.
                  Yes, sometimes -O2 actually beats -O3, but that is because the optimizer sometimes fails in it's job of accurately weighing things like increased cache use against the improved performance of a larger code segment (through inlining, unrolling etc), also -O3 is not specifically indended for 'smaller segments of code', the compiler heuristics typically does a good job of deciding which code benefits from unrolling and inlining, and which codepath's are hot and cold, just because an optimization is enabled it doesn't mean that it will end up used on all segments of code, so yes, you can use -O3 on entire applications just fine, and most cpu intense ones default to -O3 in their configurations.

                  Of course if you want to give the compiler the best help, you can always use profile guided optimization where you let the compiler gather runtime data which it can then use to better optimize the code.

                  But despite the fact that -O2 beats -O3 due to failed compiler heuristics, if you only test ONE optimization level then of course it must be -O3, again there is no 'standard' on compiler optimizations enabled per 'level' between compilers. The ONLY standard is that -O3 is supposed to generate the _fastest_ code.

                  So unless you know beforehand that -O2 in a particular test generates the fastest code for BOTH compilers on a particular benchmark, using -O2 means nothing in a benchmark where you want to see which compiler generates the _fastest_ code, as that is what -O3 is supposed to do and also does in the vast majority of cases.

                  Comment


                  • #10
                    Originally posted by curaga View Post
                    "Those tests are useless"

                    now switch perspective to someone who needs OpenMP

                    "That compiler is useless"
                    Actually that’s what I’m talking about: The tests are useless, because their result is useless. If you need OpenMP, you don’t need to look at the results. The compiler is not for you. And if you don’t need OpenMP you don’t need the results either: They have no meaning for you.

                    Comment

                    Working...
                    X