Is there a reason for not having a "-O2 -march=native -flto" test?
Announcement
Collapse
No announcement yet.
LLVM Clang 12 Benchmarks At Varying Optimization Levels, LTO
Collapse
X
-
Originally posted by DanglingPointer View PostWould be good to combine the Clang-12 and GCC-11 results.
Also some sort of a final mean in the end on mean winner and first places winner.
https://www.phoronix.com/scan.php?pa...-clang12&num=1
https://www.phoronix.com/scan.php?pa...epyc7763&num=1
rather recent within the last 2 month
Micheal was already pointing out that GCC 11 shows a slightly slower overall result when -flto is added to -O3 -march=native. Nonetheless GCC and Clang is a very close race - but often CLang seems to bee ahead.Last edited by CochainComplex; 26 June 2021, 04:17 AM.
- Likes 1
Comment
-
Originally posted by coder View PostNot all benchmarks put equal pressure on instruction cache. In cases that are more limited by it, perhaps you could get a net benefit with that combination.
However, in cases where the hotspots are dominated by a small number of loops, aggressive inlining, unrolling, and vectorization is going to be the winning strategy.
Basically, I'm wondering approximately how much processor cache would be the dividing line between picking speed and size optimizations. And that sentence just made me wonder why -march=native doesn't turn -O2 into -Os on low cache processors. Maybe they haven't done those tests to implement a cache size to O level algorithm? Dumb idea?
Comment
-
-
Originally posted by coder View PostI think the distinction is more code-specific than CPU-specific.
Comment
-
Originally posted by skeevy420 View PostPossibly. It's just hard to not notice things like Zen 2 where the lowest end has 4mb of L3 cache and the highest end 256mb.
Originally posted by skeevy420 View PostEntire programs can fit in one while only part of a program can fit in the other.
Originally posted by skeevy420 View PostPart of a program versus multiple programs. My assumption would be the one with the lower L3 cache only holding part of a program might like Os/Oz binaries when considering multitasking and interactive environments.
In other words, I think L3 sizes are mainly about data -- not code.
Comment
-
coder
Yeah, plus the L1 and L2, IIRC, are the same throughout the entire Zen 2 platform with slight differences between Zen iterations. On the lower end Zen 2s there's the potential of more binaries/loops/instructions in the L3 that aren't being swapped in and out of cache. That's the only benefit I can see and I have no idea if that matters or not in regards to raw CPU power or benchmarks.
Comment
Comment