Originally posted by sdack
View Post
Announcement
Collapse
No announcement yet.
AMD Compiler Optimization Benchmarks With GCC 4.10 (GCC 5.0)
Collapse
X
-
Originally posted by sdack View Post-mtune= only selects the CPU type for instruction scheduling. -march= selects the CPU type for the instructions to use.
Originally posted by sdack View PostThis certainly makes a difference, but it does not show with just every application. This is what the article is trying to present by the way. For example -march=k8 will select the standard x86 instruction set up to and include SSE2, -march=amdfam10 will further include SSE3 and SSE4A instructions, and -march=bdver1 will also include SSE4.1, SSE4.2 and AVX instructions.
Originally posted by sdack View Post[...]
There are a lot more parameters hidden behind these switches. These are just some of the parameters used by GCC to make its decisions. The parameter "generic" will simply pick good, average values for all of these parameters. The differences will not show unless you know what exactly to look for and by choosing an application that you know of will benefit significantly from it. Only with very precise benchmarking tools and setups can one also detect the difference this makes for other applications. The result will usually vary so much, that one needs to make many runs before a clear difference becomes visible, because these will only be tiny and the variations will add a lot of noise into the measurements. Hence the focus on ImageMagick and C-Ray.
Obviously, if you are doing numeric simulations, you will compile using march=native, but for most distribution packages, this won't make a difference. When I got my E-350, I compiled a lot of packages using my own CFLAGS in order to get most out of this CPU, however, now I'm simply using the distribution provided packages.
My point is only to include mtune=generic in these benckmarks to get a glimpse if generic tuning does a good job for this cpu (could be interesting for the compiler people).
Comment
-
Originally posted by oleid View PostI once thought that, too. Then I benchmarked my bulldozer: mtune=generic vs. march=native. And guess what? There was no difference! That's why I'd like to see mtune =generic in these benchmarks, too. After all, this is what you'll roughly get from the distributions.
Comment
-
Originally posted by carewolf View PostI believe on x64 that generic == k8
CPPFLAGS="-D_FORTIFY_SOURCE=2"
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4"
CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro"
Comment
-
Originally posted by oleid View PostYes, it's called generic optimization. The compiler will generate multiple versions of the very same code and decide on runtime what version to use.
Some applications do have code to detect the CPU at run-time and can switch to using different functions or plugins, which then make use of a particular instruction set. However, such features need to be put into the code by the programmer and do not come automatically by using gcc.
I suggest you read the documentation.
Originally posted by oleidMy point is only to include mtune=generic in these benckmarks to get a glimpse if generic tuning does a good job for this cpu (could be interesting for the compiler people).
Comment
-
Originally posted by carewolf View PostI believe on x64 that generic == k8
Tuning with generic does not match any of the processors (neither AMD or Intel CPUs from what I have seen). But it is somewhat closer to k8 than it is to amdfam10 and bdver1.
You can find the full details about it in $TOPDIR/gcc/config/i386/i386.c of gcc.Last edited by sdack; 17 August 2014, 08:02 AM.
Comment
-
Originally posted by sdack View PostIt does not matter what it is called, it is not being done here.
Some applications do have code to detect the CPU at run-time and can switch to using different functions or plugins, which then make use of a particular instruction set. However, such features need to be put into the code by the programmer and do not come automatically by using gcc.
Originally posted by sdack View PostYeah, and I would like to see some strippers, but this would also not be quite on the topic of the article.
Michael reads the comments to his articles, that's why this is the proper place for suggestions.
Comment
-
Originally posted by oleid View PostMy bad. The Intel compiler has this cpu dispatcher, not the gnu compiler. It's the cause why AMD often has a disadvantage in certain benchmarks.
It seems as if you are a bit cranky... I suggest you should visit those strippers
Michael reads the comments to his articles, that's why this is the proper place for suggestions.
Of course can you request it, but it is pointless. 4.10 is in an early stage. It still has missing code, bugs and regressions in it. So whatever you could get from it is meaningless. Even the article itself has a chance to give a false picture, because the gains shown could be the result of bugs or incomplete code and become less once 4.10 is stable. But to remain optimistic... chances are these gains are real.
Benchmarking with generic then costs additional time. But let us assume it would show generic as being faster than the other options. All it would tell you is that it has a regression at this time. This would be no news. It is already known that the compiler is still under development. If it would show to be slower, then it would also only confirm what is to be expected and the news would be in the gains coming from the other options. As it so happens is this exactly what the article focuses on and it delivers while no time was wasted.
So you can keep suggesting ideas. I rather stick to hope, and hope to get continuous news updates like these, which keep it brief and informative without being bloated and taking too much time to produce. The less time gets spend on it the more time for other news becomes available. Don't you love quickies, too?
Comment
-
Originally posted by oleid View PostYes, it's called generic optimization. The compiler will generate multiple versions of the very same code and decide on runtime what version to use.
The article wants to present the influence of different architecture optimizations on the performance. But this has nothing to do with runtime CPU dispatching.
And that's exactly what I benchmarked maybe a year ago. mtune=generic vs march=native on my E-350 for C-Ray and Graphics-Magick. And using the current compiler of that time (I guess it was gcc 4.7.x) there was no difference. I'm redoing the benchmark to check if it's still true for gcc 4.9. Of curse these results only affect this very CPU using the current compiler -- as every scientific result.
Obviously, if you are doing numeric simulations, you will compile using march=native, but for most distribution packages, this won't make a difference. When I got my E-350, I compiled a lot of packages using my own CFLAGS in order to get most out of this CPU, however, now I'm simply using the distribution provided packages.
My point is only to include mtune=generic in these benckmarks to get a glimpse if generic tuning does a good job for this cpu (could be interesting for the compiler people).
Comment
-
Originally posted by rudregues View Postoleid, I have a E-350 too. Tried many benchmarks, even comparing Gentoo and Ubuntu and came to the following conclusion: there's little to no difference. And sometimes x86_64 Ubuntu binaries was a little faster than btver1 optimized binaries.Last edited by sdack; 17 August 2014, 12:52 PM.
Comment
Comment