GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64

darkblu replied

20 February 2019, 01:16 PM
Originally posted by thesandbender View Post

.. x86 MMX is not IEEE754 compliant ..

It would be a miracle if it was -- it's integer-only.

Originally posted by thesandbender View Post

It depends on your definition of significant. fast-math allows gcc to reorder math operations. So (a * c) + (b * c) -> (a + b) * c. That is mathematically correct. However, because of rounding inherent to base 2 it can produce different results if fast-math is turn on or off.

Nothing to do with base-2. Floats are non-associative in any base you could think of, and thus distributive law would have difficulties working either. What -ffast-math (or its equivalent for non-gcc-like compilers) does is it pretends floats are associative and always finite numbers. Masking error conditions is just icing on the cake.

On the subject of benchmarking compilers -- campbell is right, Michael needs to start taking into account what the original code authors wrote.
Leave a comment:
campbell replied

15 February 2019, 03:16 AM
More followup: I put together a canned benchmark of some of my HPC code, where the bulk of the workload is a solution of a tridiagonal system of equations in complex arithmetic. It's over twice as fast with -ffast-math vs. without, on the Cortex A53.
Likes 1
Leave a comment:
carewolf replied

14 February 2019, 04:20 AM
Originally posted by thesandbender View Post

3. Compilers fall under IEEE754 as a development environment, they do not have to implement the standard 100% to be compliant as long as they clearly specify what the exceptions are and code created with those exceptions in mind is IEEE754 compliant (much like SSE, your code/program is IEEE754 compliant so long as you don't use the instructions that don't clearly state IEEE754 compliance). So yes, they are compliant provided you do not use the operations/features they list as none compliant (as the Intel and MSVC docs do). It could be viewed as splitting hairs but for the vast majority of code the default options (yes, including /fp:strict) generate compliant code. Yes that could be viewed as splitting hairs but your initial statement was that they "break" IEEE754 and this is just not true.

That is a bit nonsensical. To be compliant they do have to follow the spec, that is what it means. Listing exceptions to the standard, makes you well-documented, but not following the standard. But I see what you mean and do agree with you, if I stop splitting hairs.

Note however that while they just have a few features they are non compliant with, it is not something you can control as a programmer. In particular they reserve the right to do operations in a different order or fused, this means the precision might not be what you expect, but it won't mathematically be worse than what the standard demands, just not exaclty what the standard says it should be. Nor do they actually list all non-strict optimizations they do.

Now that we are on the same page. Back to my original statement. The the issue with GCC is that it defaults to the strictest mode unlikely most other compilers, which makes optimizating hard, and especially auto-vectorization. For some code using the flag -ffast-math is a good option, but for general code or as default it is not optimal as it is TOO lax, it does not simply turn gcc to a safe less strict mode, it is more similar to the fast mode of the other compilers than their default mode. This was why I use a subset of the flags that -ffast-math is made out off, instead of the entire flag.

There has been multiple discussions on the gcc mailing list on this. I once suggested making -ffast-math default, then parts of it, when I learned the details of why it is not comparable with other's compilers lax defaults, and I have seen other people making similar suggestions through the years. The issue always end up in a combination of bike-sheding and the difficulty of deciding what you can relax and what you can not. Perhaps it would best to start by making an intermediate setting that just tries to be as lax as the the other compilers on their default setting, but not their fast-fp setting.
Likes 1
Leave a comment:
campbell replied

13 February 2019, 09:45 PM
Originally posted by thesandbender View Post

Campbell, thanks for doing that. What were was the full set of gcc flags?

CFLAGS="-Ofast -march=armv8-a -mtune=cortex-a53"
vs
CFLAGS="-O3 -march=armv8-a -mtune=cortex-a53"

As a side note, I tried it with both glibc and musl and there was no noticeable difference. Often musl is significantly faster on aarch64 (without sacrificing correctness) than glibc, but in this case I guess there just aren't enough library calls for it to matter. Willing to bet that for some of the other benchmarks in Michael's test suite, it would matter more.
Likes 1
Leave a comment:
thesandbender replied

13 February 2019, 09:28 PM
Campbell, thanks for doing that. What were was the full set of gcc flags?
Leave a comment:
campbell replied

13 February 2019, 09:17 PM
Enough hypotheticals. I just did a test of c-ray on a Cortex-A53 with various compiler flags, using gcc 8.2.1, and it's significantly (17%) faster with -ffast-math than without. Even more perplexingly, the Makefile that comes with c-ray has -ffast-math enabled, but Michael is running WITHOUT it in this set of benchmarks. Why?
Likes 1
Leave a comment:
thesandbender replied

13 February 2019, 08:36 PM
Originally posted by carewolf View Post

x87 is perfectily IEEE754 compliant if used correctly, it is just that most compilers let it use higher precision than is it is supposed to for 64bit floating point.

About the compilers, did you even read the links you posted??

For instance Intel CC: "This version of the compiler uses a close approximation to the IEEE Standard for Floating-Point Arithmetic, version IEEE 754-2008, unless otherwise stated." Close approximation means none strict behavior.

MSVC defaults to \fprecise, which if you read closely is NOT the same as \fp:strict, it is just very close to it.....

1. The IEEE754 standard is not just restricted to 64bit, it also defines 32bit and 128bit formats. (note: I'm not saying x87 80-bit is 128-bit compliant)

2. Those are interchange formats, IEEE754 does not restrict the internal arithmetic. On the contrary, the standard recommends using extended precision for arithmetic. Section 3.7 Extended and extendable precisions: Extended and extendable precision formats are recommended for extending the precisions used for arithmetic beyond the basic formats. So compilers generating 80-bit x87 code are following the spec.

3. Compilers fall under IEEE754 as a development environment, they do not have to implement the standard 100% to be compliant as long as they clearly specify what the exceptions are and code created with those exceptions in mind is IEEE754 compliant (much like SSE, your code/program is IEEE754 compliant so long as you don't use the instructions that don't clearly state IEEE754 compliance). So yes, they are compliant provided you do not use the operations/features they list as none compliant (as the Intel and MSVC docs do). It could be viewed as splitting hairs but for the vast majority of code the default options (yes, including /fp:strict) generate compliant code. Yes that could be viewed as splitting hairs but your initial statement was that they "break" IEEE754 and this is just not true.

<Edit>
I should add that it's perfectly possible to create IEEE754 code/programs even with IEEE754 flags turning compliance off in the compiler. You just have to be 100% aware of what the compiler is doing and code around it. That's generally not a good idea though because complex code becomes a lot less portable and you have to re-validate compliance anytime anything else about the environment changes (compiler, hardware). Sometimes the cost/benefit is justified though, like in HFT, weather calculations, etc.
</Edit>

Last edited by thesandbender; 13 February 2019, 08:54 PM.
Leave a comment:
carewolf replied

13 February 2019, 07:18 PM
Originally posted by thesandbender View Post

Apologies but you have no idea what you're talking about. "all x86 and x87 compilers" default to not being IEEE754 compliant? That is demonstrably wrong. Are you trying to say that x87 isn't IEEE754 compliant?

MSVC : Defaults to fprecise (though there was some discussion about making /fp:fast the default a few years ago)
Intel CC : Nope
Apple clang on iOS : No (default build settings have $GCC_FAST_MATH set to NO)

x87 is perfectily IEEE754 compliant if used correctly, it is just that most compilers let it use higher precision than is it is supposed to for 64bit floating point.

About the compilers, did you even read the links you posted??

For instance Intel CC: "This version of the compiler uses a close approximation to the IEEE Standard for Floating-Point Arithmetic, version IEEE 754-2008, unless otherwise stated." Close approximation means none strict behavior.

MSVC defaults to \fprecise, which if you read closely is NOT the same as \fp:strict, it is just very close to it.....
Likes 1
Leave a comment:
thesandbender replied

13 February 2019, 05:15 PM
Originally posted by carewolf View Post

Yes, MSVC, Intel CC, Apple clang for iOS, all compilers on x86 with x87. Breaking IEEE754 is more common than being strict. But note that the things I brought up was deliberately not related to violating IEEE754, but to how FP operations interact with the standard-library (errno) and the OS(trapping), disabling errno or trapping does not make it IEEE754-noncomplient.

Apologies but you have no idea what you're talking about. "all x86 and x87 compilers" default to not being IEEE754 compliant? That is demonstrably wrong. Are you trying to say that x87 isn't IEEE754 compliant?

MSVC : Defaults to fprecise (though there was some discussion about making /fp:fast the default a few years ago)
Intel CC : Nope
Apple clang on iOS : No (default build settings have $GCC_FAST_MATH set to NO)
Leave a comment:
thesandbender replied

13 February 2019, 05:00 PM
Originally posted by campbell View Post

If your code produces significantly different answers with vs. without -ffast-math, then BOTH answers are horseshit.
...
One of them may comply with a standard, but that doesn’t make it more mathematically correct.
...
If that’s not possible, then no one should be making financial, scientific, or safety critical conclusions based on the output of that code.
...
Regarding financial applications, as far as I’ve heard, isn't the guidance still to avoid use of floats at all because they’re not a base 10 number system?

It depends on your definition of significant. fast-math allows gcc to reorder math operations. So (a * c) + (b * c) -> (a + b) * c. That is mathematically correct. However, because of rounding inherent to base 2 it can produce different results if fast-math is turn on or off. You're absolutely correct that this can just be addressed by refactoring the code (in most cases) but for large code bases touch by dozens of people of varying skill/competence this is often not achievable.

I didn't say it made it correct, I said it made it reproducible (see the example above). There's a difference.

A lot of times your using libraries that you didn't write and you can't refactor. That's why standards are important. Motor Industry Software Reliability Association (MISRA) requires floating point standards requirement (with IEEE754 as the default) as do DO-178B/C and DO-3331 (Aerospace Coding Standards). I believe IEC 62304 (Medical Standards) do as well but I'm not positive. As pointed out, compiling with the standards doesn't assure that your math is correct, just that it behaves in an expected way.

In an ideal world, yes. But in my experience it's not that common.

Last edited by thesandbender; 13 February 2019, 05:02 PM. Reason: Condensed quotes.
Leave a comment:

Announcement

GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: