`-Ofast` turns on ` -ffast-math` which is really unsafe for some software... Wouldn't even entertain the idea of using that on all software... It's risky
Announcement
Collapse
No announcement yet.
GCC 11 Compiler Performance Benchmarks With Various Optimization Levels, LTO
Collapse
X
-
Originally posted by F.Ultra View PostWhy not? The whole purpose of LTO is to have the optimizer in the compiler have access to all of the source code and not having to work with one source file at the time. So if LTO produced slower code then that means that the optimizer did not in fact optimize the code -> there is a bug in the optimizer that makes it produce less optimized code.
What you need, for the compiler to make optimal speed/space tradeoffs, is profiling information that tells it the relative frequency of different code paths. PGO could be used to supply that information, although that presumes both that it would reach the LTO stage and that the runtime behavior of the code is easily characterizable by a small set of sample workloads.
Comment
-
Originally posted by coder View PostEven having access to the entire program code is insufficient to determine program behavior, due to data-dependence and practical limits on computability.
So either some optimisations on file scope are producing fast but faulty code and LTO undoes that (I doubt that though), or, the larger scope of LTO somehow masks some opportunities and this is why it leads to worse performance. It could have other causes, too, and it is all just speculation. However, the technical reasoning of why LTO should in general not produce worse, but better code remains sound. After all, this is the core idea behind LTO and to allow for the optimisation to find more opportunities.
PGO then is very beneficial to LTO and one should try to combine these whenever possible, because it does indeed make for a great combination. But the question is, why does LTO when used on its own with GCC appear to produce worse code? If this can get fixed then it could likely lead to better performance when LTO gets combined with PGO. I see no reason why not.Last edited by sdack; 18 June 2021, 01:57 PM.
- Likes 2
Comment
-
Originally posted by sdack View PostYou are missing the point. It is about the size of the scope. A larger scope means more opportunities for optimisations.
Originally posted by sdack View Postwhen there are not more opportunities. And if there are more then it should show in more optimisations. It is not possible to suddenly have less opportunities when the scope gets larger, unless the now missing opportunities were in fact none, which means you have a faulty optimisation process.
Whether you accept my conjecture or pursue insight elsewhere, the data is what it is. Before deciding it's a bug, you'd do well to understand the root cause.
Originally posted by sdack View PostHowever, the technical reasoning of why LTO should in general not produce worse, but better code remains sound.
Comment
-
Originally posted by coder View PostIf you take a step back and think about it ...
If you want to tell yourself that all is fine, this is the way it should work, then go ahead.
- Likes 2
Comment
-
Originally posted by coder View PostIf you take a step back and think about it, why do we have any higher standards for compiler performance with LTO than -O3? And there are plenty of cases where -O3 is slower!
And we do have a higher standard for LTO than -O3 since we with LTO gives the optimizer far more material to work with instead of adding extra optimizer steps that are known to be buggy (-O3). Having access to more material should help the optimizer make a better judgement, not a worse one.Last edited by F.Ultra; 19 June 2021, 03:00 PM.
Comment
-
Originally posted by sdack View PostIf you want to tell yourself that all is fine, this is the way it should work, then go ahead.
Comment
-
Originally posted by F.Ultra View PostAnd cases where -O3 is slower are IMHO also due to bugs in the optimizer.
We know there are computationally hard problems in code optimization. There are also lots of heuristics involved, and it's probably difficult to optimize them all, relative to each other.
Originally posted by F.Ultra View PostPGO is no panacea either since not all applications will experience the same workload from run to run.
Originally posted by F.Ultra View PostAnd we do have a higher standard for LTO than -O3 since we with LTO gives the optimizer far more material to work with instead of adding extra optimizer steps that are known to be buggy (-O3). Having access to more material should help the optimizer make a better judgement, not a worse one.
There's another thing I'm curious about, and that's whether LTO has access to the source used to compile the original object files. If not, then it's really not the same as just giving the compiler more scope. It could be that all the original optimization decisions are baked and all LTO can do is just some additional inlining. That would mean it could do little else than remove some function call overhead, at best -- and just add code bloat, at worst.
Comment
-
Originally posted by coder View Post... but it's a wholly different diagnosis than saying it's a bug.
- Likes 2
Comment
-
Originally posted by coder View PostI think it's too sloppy to simply label it as a bug. A bug is something other than a limitation. It's a mismatch between intention and implementation. And I don't mean just an intention like "it should be faster", but like specific strategies that are not working as intended. It's also fixable.
We know there are computationally hard problems in code optimization. There are also lots of heuristics involved, and it's probably difficult to optimize them all, relative to each other.
Originally posted by coder View PostMost software is designed pre-LTO, and therefore functions which would provide the greatest benefit by inlining are already defined as inline functions (or are at least somehow visible at file-scope). This limits the upside of LTO to doing inlining mostly where it can't help much (and inlining can always hurt by bloating code size).
Originally posted by coder View PostThere's another thing I'm curious about, and that's whether LTO has access to the source used to compile the original object files. If not, then it's really not the same as just giving the compiler more scope. It could be that all the original optimization decisions are baked and all LTO can do is just some additional inlining. That would mean it could do little else than remove some function call overhead, at best -- and just add code bloat, at worst.
Comment
Comment