It would be interesting to compare these results with -march=x86-64-v2 and -march=x86-64-v3 to see if using the default scheduler with SSE4.2 or AVX2/FMA makes a noticeable enough difference to justify providing CPU-specific binaries.
Announcement
Collapse
No announcement yet.
AMD Zen 3 Performance With The Initial "znver3" GCC Compiler Support
Collapse
X
-
Originally posted by S.Pam View PostYou can always use a source-based distribution =)
Originally posted by S.Pam View PostI think code can also be compiled with multiple targets so that they detect at run-time what code-path to use.
Comment
-
Originally posted by pal666 View Postthen you'll spend all your speed gains and more on rebuilds
I swear. it's waaaay quicker to build a Gentoo system then it is to strip down Windows... And I guarantee Gentoo built by the handbook guidelines is far better optimized than Windows...
Comment
-
I wonder what more detailed analysis of SciMark 2 improvements will show. The other tests and benchmarks show what I was expecting, about 1%-2% improvements compared to using znver2. The SciMark 2 for some reasons is way more than that. Interesting.
I also remember when Phoronix did comparison of znver2 vs znver1 on Ryzen 3000 series, also SciMark 2 gained significant improvements, with other benchmarks just maybe 2%-3%, that were expected.
There is something in SciMark 2 that is very sensitive to some critical timing in some tight loop or instruction ordering. Maybe decompose SciMark 2 into its individual benchmarks? Also what it means the SciMark is compiled with specific options? Is entire Java stack recompiled with different options? Is it running benchmark long enough to offset any JIT time differences?Last edited by baryluk; 10 December 2020, 03:19 AM.
- Likes 1
Comment
-
Originally posted by baryluk View PostI wonder what more detailed analysis of SciMark 2 improvements will show. The other tests and benchmarks show what I was expecting, about 1%-2% improvements compared to using znver2. The SciMark 2 for some reasons is way more than that. Interesting.
I also remember when Phoronix did comparison of znver2 vs znver1 on Ryzen 3000 series, also SciMark 2 gained significant improvements, with other benchmarks just maybe 2%-3%, that were expected.
There is something in SciMark 2 that is very sensitive to some critical timing in some tight loop or instruction ordering. Maybe decompose SciMark 2 into its individual benchmarks? Also what it means the SciMark is compiled with specific options? Is entire Java stack recompiled with different options? Is it running benchmark long enough to offset any JIT time differences?
I am more curious about the GraphicsMagick result, where the Haswell configuration rules. It could be an accident, but perhaps something from that configuration is good for znver3?
Comment
-
Originally posted by carewolf View PostYeah, seems sensitive to something alright.
Everything else in the benchmarks looks like ordinary deviations, which will be why we saw znver2 outperforming znver3 a few times. Three runs for each test isn't much and doesn't allow for very precise comparisons. It's barely enough to even calculate a first deviation for a test. But don't tell Michael I've said that ...
Comment
-
Originally posted by sdack View PostTwo new instructions come to mind: a vector AES instruction, and a new carry-less quad word multiplication. When these get used then I would expect to see some difference.
Comment
-
Originally posted by sdack View PostTwo new instructions come to mind: a vector AES instruction, and a new carry-less quad word multiplication. When these get used then I would expect to see some difference.
Everything else in the benchmarks looks like ordinary deviations, which will be why we saw znver2 outperforming znver3 a few times. Three runs for each test isn't much and doesn't allow for very precise comparisons. It's barely enough to even calculate a first deviation for a test. But don't tell Michael I've said that ...
AES and carry-less quad word multiplication are not used by scimark.
It is mostly FP64 ("double") and vectorization and cache usage that are a factor here. There isn't even many divisions (or integer division) done in SciMark 2, so the improved latency of integer division is not a factor either.
I think the factor however is "L2" cache. gcc l2 cache size param, actually refers here to LLC (last level cache), i.e. L3 in case of Zen. Setting znver3 I think sets higher value than when using znver2, and some algorithms in SciMark can use better blocking (decomposing two dimensional array traversal into chunks / blocks, but the optimal size of these blocks is highly dependent on the L2 and L3 cache sizes).
I don't have access to znver3 hardware to confirm, but my guess the SOR (Successive Over Relaxation) sub-benchmark from SciMark 2, is the "culprit" here. It has doubly nested loop, which is exactly the type that would benefit here from better blocking.
Comment
-
Originally posted by sdack View PostTwo new instructions come to mind: a vector AES instruction, and a new carry-less quad word multiplication. When these get used then I would expect to see some difference.
Everything else in the benchmarks looks like ordinary deviations, which will be why we saw znver2 outperforming znver3 a few times. Three runs for each test isn't much and doesn't allow for very precise comparisons. It's barely enough to even calculate a first deviation for a test. But don't tell Michael I've said that ...
Comment
Comment