Originally posted by bug77
View Post
Announcement
Collapse
No announcement yet.
GCC 8/9 vs. LLVM Clang 7/8 Compiler Performance On AArch64
Collapse
X
-
Originally posted by thesandbender View Postworthless for finance (my day job), engineering, science, etc. In a nutshell, the "dumb" requirement is that FP operations should repeatedly produce the same results on a IEEE754 machine, that's critical for any number of applications.
My bread and butter is high performance scientific computing and embedded signal processing. It would be incredibly negligent if I left throughput or battery life on the table in order to achieve IEEE 754 "correctness" when doing so wouldn’t actually change the output by a statistically significant amount. It would be even MORE negligent to release code that does change its answer significantly when run with vs. without those flags - that's a symptom of code that needs to be rewritten because the math is ill-conditioned (i.e. already producing a horseshit answer without those compiler flags), and I then do so.
Regarding financial applications, as far as I’ve heard, isn't the guidance still to avoid use of floats at all because they’re not a base 10 number system? Maybe that’s no longer possible in the age of higher level languages that just have a "number" datatype and don’t differentiate between integers and floating point, but that’s beside the point, because we’re talking about Michael benchmarking compiled code in real languages that care about these flags.
(I’ve even seen legacy code in safety-critical applications, from well before these compiler options were widespread, that adds small floating point epsilons to nearly every term in order to avoid denormals - a "feature" required for strict IEEE 754 compliance that, on x86, can slow that particular code down by a factor of 20 without the epsilons and without the compiler being allowed to relax IEEE compliance. Nothing of value is gained by slowing the code down, the answer doesn’t change by a statistically significant amount, because the math is well-formed and is insensitive to it.)Last edited by campbell; 13 February 2019, 08:52 AM.
- Likes 1
Leave a comment:
-
Originally posted by thesandbender View Post
Can you point to any C compiler that defaults to being non-IEEE754 compliant? Almost all have an option to disable it but one that does it by default (or always) is worthless for finance (my day job), engineering, science, etc. In a nutshell, the "dumb" requirement is that FP operations should repeatedly produce the same results on a IEEE754 machine, that's critical for any number of applications.Last edited by carewolf; 13 February 2019, 08:42 AM.
- Likes 1
Leave a comment:
-
Originally posted by carewolf View PostBut issue remains, the C standard has some dumb requirements for many floating point operations and library functions that makes parallizing them problematic, and most compilers ignore that
Regardless, you're comparing SIMD instructions on x86 to NEON on ARM. Even x86 SIMD instruction sets don't all have the same requirements. x86 MMX is not IEEE754 compliant, SSE is mostly compliant (there are a few exceptions where the behavior is not officially defined for an instruction by Intel... e.g. rsqrtss) and AVX is completely compliant (like NEON on ARMv8 AArch64). So yes, getting the most out of MMX requires turning off IEEE754 compliance but that is not at all true for AVX or NEON. GCC and clang versions that are aware that the target is compliant will treat them as such. That's one of the selling points for AVX in enterprise.
- Likes 1
Leave a comment:
-
For instance on x86 with GCC what you want is in particular -fno-math-errno so each FP instruction isn't supposed to set errno and -fno-signaling-nans so each FP instructions isn't supposed to signal independently. I seem to recall one more of the flags -ffast-math is composed of that was necessary in the cases I worked on, but I can't remember based from the documentation which one it was. But issue remains, the C standard has some dumb requirements for many floating point operations and library functions that makes parallizing them problematic, and most compilers ignore that, except GCC, and on Linux clang follows what GCC set as standard behavior
- Likes 4
Leave a comment:
-
Originally posted by campbell View PostThese benchmarks are pretty much meaningless without -ffast-math or -Ofast. The compilers won't do much with the NEON instructions supported by these processors without being told they're allowed to relax the floating point order of operations.
- Likes 1
Leave a comment:
-
That cachebench write test is very intriguing. As you state Michael. It is similar to the Power bench. I suspect GCC it would be worthwhile finding out what the diffenence is. Apparently it is code related. Is GCC setting some membarriers for the writes?
- Likes 1
Leave a comment:
-
These benchmarks are pretty much meaningless without -ffast-math or -Ofast. The compilers won't do much with the NEON instructions supported by these processors without being told they're allowed to relax the floating point order of operations.
- Likes 1
Leave a comment:
-
Basically a fancy way of saying the performance improvements are modest and there are no regression.
But this must be put into context, because compilers aren't only about performance. They're also about better code analysis and in turn better error messages which will lead to faster development times. Maybe it's worth pointing this out in further reviews Michael
Leave a comment:
-
Typo:
Originally posted by phoronix View PostAll four of these compilers were built on thos Ampere eMAG server and built in their release/optimized (non-debug) modes.
4K
Clang 7: 18.67fps
Clang 8: 18.77fps
GCC 8: 19.58fps
GCC 9: 19.91fps
1080p
Clang 8: 51.55fps
Clang 7: 51.61fps
GCC 8: 52.42fps
GCC 9: 52.81fps
Still not ready for real-time 4K AV1...Last edited by tildearrow; 12 February 2019, 04:51 PM.
- Likes 2
Leave a comment:
Leave a comment: