Announcement

Collapse
No announcement yet.

GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64

    Phoronix: GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64

    With Clang 8.0 due out by month's end and GCC 9 due for release not long after that point, this week we've been running a number of GCC and Clang compiler benchmarks on Phoronix. At the start of the month was the large Linux x86_64 GCC vs. Clang compiler benchmarks on twelve different Intel/AMD systems while last week was also a look at the POWER9 compiler performance on the Raptor Talos II. In this article we are checking out these open-source compilers' performance on 64-bit ARM (AArch64) using an Ampere eMAG 32-core server.

    http://www.phoronix.com/vr.php?view=27538

  • #2
    Typo:

    Originally posted by phoronix View Post
    All four of these compilers were built on thos Ampere eMAG server and built in their release/optimized (non-debug) modes.
    dav1d:

    4K
    Clang 7: 18.67fps
    Clang 8: 18.77fps
    GCC 8: 19.58fps
    GCC 9: 19.91fps

    1080p
    Clang 8: 51.55fps
    Clang 7: 51.61fps
    GCC 8: 52.42fps
    GCC 9: 52.81fps

    Still not ready for real-time 4K AV1...
    Last edited by tildearrow; 02-12-2019, 04:51 PM.

    Comment


    • #3
      Basically a fancy way of saying the performance improvements are modest and there are no regression.
      But this must be put into context, because compilers aren't only about performance. They're also about better code analysis and in turn better error messages which will lead to faster development times. Maybe it's worth pointing this out in further reviews Michael

      Comment


      • #4
        These benchmarks are pretty much meaningless without -ffast-math or -Ofast. The compilers won't do much with the NEON instructions supported by these processors without being told they're allowed to relax the floating point order of operations.

        Comment


        • #5
          That cachebench write test is very intriguing. As you state Michael. It is similar to the Power bench. I suspect GCC it would be worthwhile finding out what the diffenence is. Apparently it is code related. Is GCC setting some membarriers for the writes?

          Comment


          • #6
            Originally posted by campbell View Post
            These benchmarks are pretty much meaningless without -ffast-math or -Ofast. The compilers won't do much with the NEON instructions supported by these processors without being told they're allowed to relax the floating point order of operations.
            campell, that was true for ARMv7 but ARMv8 NEON is ieee754 compliant (in AArch64 mode) so those options aren't necessary. [ARM Docs]

            Comment


            • #7
              Originally posted by thesandbender View Post

              campell, that was true for ARMv7 but ARMv8 NEON is ieee754 compliant (in AArch64 mode) so those options aren't necessary. [ARM Docs]
              They still make it much easier for the compiler can move the instructions around, which make it easier to fit into SIMD instructions. But yes, it can still get paralized sometimes without. But if you want good automatic vectorization with floating-point, -ffast-math is recommended, and that counts for any architecture.

              For instance on x86 with GCC what you want is in particular -fno-math-errno so each FP instruction isn't supposed to set errno and -fno-signaling-nans so each FP instructions isn't supposed to signal independently. I seem to recall one more of the flags -ffast-math is composed of that was necessary in the cases I worked on, but I can't remember based from the documentation which one it was. But issue remains, the C standard has some dumb requirements for many floating point operations and library functions that makes parallizing them problematic, and most compilers ignore that, except GCC, and on Linux clang follows what GCC set as standard behavior

              Comment


              • #8
                Originally posted by carewolf View Post
                But issue remains, the C standard has some dumb requirements for many floating point operations and library functions that makes parallizing them problematic, and most compilers ignore that
                Can you point to any C compiler that defaults to being non-IEEE754 compliant? Almost all have an option to disable it but one that does it by default (or always) is worthless for finance (my day job), engineering, science, etc. In a nutshell, the "dumb" requirement is that FP operations should repeatedly produce the same results on a IEEE754 machine, that's critical for any number of applications.

                Regardless, you're comparing SIMD instructions on x86 to NEON on ARM. Even x86 SIMD instruction sets don't all have the same requirements. x86 MMX is not IEEE754 compliant, SSE is mostly compliant (there are a few exceptions where the behavior is not officially defined for an instruction by Intel... e.g. rsqrtss) and AVX is completely compliant (like NEON on ARMv8 AArch64). So yes, getting the most out of MMX requires turning off IEEE754 compliance but that is not at all true for AVX or NEON. GCC and clang versions that are aware that the target is compliant will treat them as such. That's one of the selling points for AVX in enterprise.
                Last edited by thesandbender; 02-13-2019, 04:55 AM. Reason: typo

                Comment


                • #9
                  Originally posted by thesandbender View Post

                  Can you point to any C compiler that defaults to being non-IEEE754 compliant? Almost all have an option to disable it but one that does it by default (or always) is worthless for finance (my day job), engineering, science, etc. In a nutshell, the "dumb" requirement is that FP operations should repeatedly produce the same results on a IEEE754 machine, that's critical for any number of applications.
                  Yes, MSVC, Intel CC, Apple clang for iOS, all compilers on x86 with x87. Breaking IEEE754 is more common than being strict. But note that the things I brought up was deliberately not related to violating IEEE754, but to how FP operations interact with the standard-library (errno) and the OS(trapping), disabling errno or trapping does not make it IEEE754-noncomplient.
                  Last edited by carewolf; 02-13-2019, 08:42 AM.

                  Comment


                  • #10
                    Originally posted by thesandbender View Post
                    worthless for finance (my day job), engineering, science, etc. In a nutshell, the "dumb" requirement is that FP operations should repeatedly produce the same results on a IEEE754 machine, that's critical for any number of applications.
                    If your code produces significantly different answers with vs. without -ffast-math, then BOTH answers are horseshit. One of them may comply with a standard, but that doesn’t make it more mathematically correct. Refactor the code so that it doesn’t care. If that’s not possible, then no one should be making financial, scientific, or safety critical conclusions based on the output of that code.

                    My bread and butter is high performance scientific computing and embedded signal processing. It would be incredibly negligent if I left throughput or battery life on the table in order to achieve IEEE 754 "correctness" when doing so wouldn’t actually change the output by a statistically significant amount. It would be even MORE negligent to release code that does change its answer significantly when run with vs. without those flags - that's a symptom of code that needs to be rewritten because the math is ill-conditioned (i.e. already producing a horseshit answer without those compiler flags), and I then do so.

                    Regarding financial applications, as far as I’ve heard, isn't the guidance still to avoid use of floats at all because they’re not a base 10 number system? Maybe that’s no longer possible in the age of higher level languages that just have a "number" datatype and don’t differentiate between integers and floating point, but that’s beside the point, because we’re talking about Michael benchmarking compiled code in real languages that care about these flags.

                    (I’ve even seen legacy code in safety-critical applications, from well before these compiler options were widespread, that adds small floating point epsilons to nearly every term in order to avoid denormals - a "feature" required for strict IEEE 754 compliance that, on x86, can slow that particular code down by a factor of 20 without the epsilons and without the compiler being allowed to relax IEEE compliance. Nothing of value is gained by slowing the code down, the answer doesn’t change by a statistically significant amount, because the math is well-formed and is insensitive to it.)
                    Last edited by campbell; 02-13-2019, 08:52 AM.

                    Comment

                    Working...
                    X