Originally posted by XorEaxEax
View Post
Announcement
Collapse
No announcement yet.
11-Way Intel Ivy Bridge Compiler Comparison
Collapse
X
-
-
Originally posted by AnonymousCoward View PostExample: bsnes, a SNES emulator that emulates various special chips, and you have to run a set of games that actually use all of them to get everything optimized. Maybe "a load covering all performance-relevant code paths" is more appropriate (and longer), and I guess running them for non-interactive programs is easier.
Atleast that's what I did when I benchmarked Mame, I wrote an ugly bash-script which launched you in Mame from where you had to manually run games atleast a bit into the attract gameplay sequence (good thing arcade games always has one) while profiling and then when you quit it started benchmarking the roms from a user-defined list (which would correspond to the roms you ran in profiling mode, else kinda pointless) and the PGO builds always surpassed the non-PGO builds in performance, sometimes severely (like ~20%+).
Generally though programs doesn't really have such varied codepaths, and if they do they can likely often be dealt with easily like rendering a test scene in Blender which has fluid physics, hair, textures, different style lamps etc would reach the code paths of all these separate features, or running a script in gimp which cycles through different filters etc etc.
That said, PGO really only matters if we are talking about cpu intensive stuff and the time you spend on recompiling a program using PGO only really matters if the extra performance will make a difference, Firefox running more smoothly with PGO, an emulator running in 100% instead of 70%, a compressor/encoder/renderer/compiler program you use very regularly being made ~15-20% faster etc. These are likely good 'end-user' reasons for recompiling something with PGO, more convenient though (if perhaps a little less performant) if it's done by your upstream binary packager.
Then of course there are those who like me just find compiler optimization technology fascinating and benchmark just out of interest, beats collecting stamps (though probably just barely) ;D
Comment
-
Originally posted by XorEaxEax View PostHowever I agree that this lies outside the scope of a testsuite like Phoronix as it would be quite unrealistic to expect Micheal to write automatic profiling scripts for these tests.
It's a myth that you have to stress every path for PGO to be effective* - you just need to stress the most statistically significant path, and you'll most likely always get a speed boost.
* - for the vast majority of softwareLast edited by kiputnik; 31 May 2012, 06:53 AM.
Comment
-
Intel Compiler
Originally posted by curaga View PostYou've been under a rock perhaps? The binaries by ICC will suck on AMD, VIA and anything else non-Intel x86. See Agner's site for insightful explanations, including benchmarks where he changes his Via to introduce as Intel.
icc -msse2 [...]
It works fine. Depending on the code I compile, icc CAN produces binaries which are faster on AMD than the same code compiled with GCC. But I have also some code which is slower using icc, than gcc, even on intel hardware.
@Michael:
Concerning the availability of the intel compuiler. Michael, I guess you may not use the non commercial version for testing, right? Maybe you could contact intel and ask for permission. I consider the intel compiler as one of the best compilers, so inclusion would be sensible.
Comment
-
Originally posted by kiputnik View PostThe point is - he doesn't have to. Just run the benchmark twice - once for the -fprofile-generate binary, recompile with -fprofile-use, and run it again. He already has the necessary 'representative load' required.
It's a myth that you have to stress every path for PGO to be effective* - you just need to stress the most statistically significant path, and you'll most likely always get a speed boost.
* - for the vast majority of software
I have found sometimes running PGO on exactly the same benchmark creates a considerable speed-up, but the resulting executable is slower at other jobs. This is not surprising, as you are tuning for one specific benchmark (or small set of benchmarks).
To test PGO fairly, you really need a range of "standard" instances, which avoid over-tuning. This really has to come from the original program authors I would say, some of whom include a 'make profiled' flag, and some don't.
Comment
-
Originally posted by curaga View PostHm, having to wait for Gimp filters often, I wonder if it supports that (automatic PGO build, testing all filters)? Any Gimp people around?
Comment
-
Originally posted by uid313 View PostGIMP will improve filter performance using the GEGL library which is OpenCL powered and provides hardware acceleration.
But there's a CPU downside too: OpenCL code written for GPUs is limited, and so under a CPU it is likely to be slower than code natively written for a CPU (pthreads, et al).
Comment
-
Originally posted by elanthis View PostExcept that GCC has always been "stupid rubbish shit" -- and has _intentionally_ been that way due to RMS's paranoia -- except for the barely-relevant part where it produces faster binaries than irrelevant compilers almost nobody uses (Open64) or a compiler that's practically an infant in comparison (Clang/LLVM). Clang matches the performance it took GCC 25 years to achieve, not to mention the fact that it has an equivalent level of language conformance and features (again, from zero to that complete in a teeny tiny fraction of the time it took GCC), plus the so-freaking-awesome toolset support it enables that GCC goes out of its way to make impossible to write.
In the few cases where binary performance in a few specialized micro-benchmarks actually matter, it's worth noting that GCC is still not even top dog, so it has the unpleasant distinction of being neither the faster compiler nor the more featureful, flexible, maintainable, extensible compiler. The only crown it can hold is "most popular compiler for UNIX systems." Yay.
Without Clang, the world of Open Source compilers would be stuck forever with glorified Notepad apps (Vim, Emacs) and a practically tools-free development environment. With Clang, the FOSS scene actually has a chance to start playing catch-up to Visual Studio / VAX. There's a chance to have actually useful code completion (real-time, no need to regenerate ctags and wait 5 minutes for it to complete), to have powerful code refactoring (nobody but VS/VAX has this yet, which is why it's so important for FOSS to catch up), and most importantly to have a compiler that provides a valid test ground for new language extensions and features to propose to the relevant committees (GCC is a nightmare to extend, maintain, learn, or improve; only a small handful of people can deal with its horrific internals). This is of course why just about every company on the planet with an interest in C/C++ have gotten involved with Clang: it is a massive improvement on all fronts that _actually matter_, and the performance of compiled binaries non-issue can be improved as time goes on (and again, it has improved at a much MUCH faster rater than GCC has).
But thanks anyway for your input as a non-developer fanboy. The world would such a worse place without your clueless rants and abuse of fonts.
1.) is not like core i7 and bulldozer exist from 25 years and gcc is just catching up now, gcc has grown and supported every hardware generation that has ever existed or at least been thinked by a human being relatively close to the release and has always offered very competitive performance with very few exceptions, even comparing the old gcc 2.95 vs icc of the same age shows gcc been second only to icc and not by much
2.) Clang yes is catching up fast but is not in the same league as gcc you are comparing apples to tires here, to begin with Clang/llvm only support a very minimal subset of plataforms compared to gcc and Clang is barely used today, so no need to maintain backward compatibility with anything so you can clean your code as much as you want without hassle.
3.) GCC is the most used compiler in the universe not even visual studio or ICC comes close and is not tied to unix like systems either LOL, but for this reason they need to maintain a crap load of backward compatibility code for many plataform/oses which you may think is stupid but are vital to many massive companies and institutes around the world(not all ppl use compilers for desktop you know!!).
4.) GCC has been always creative and efficient integrating new technologies inside the compiler that achieve real world performance/efficiency in all the plataforms supported by gcc(when possible ofc) namely C/C++11/atomics/ssa/pgo/branching/cpu features/lto/IPA/profiling/OpenMP/etc. Which even today are example for other compilers including clang(so is not like clang reinvented LTO, they used gcc as base example and polish for their code later).
5.) is true that gcc is extremely complex inside but is not cuz gcc devs are visual studio tards and clang ppl are einstein like geniuses, is the reasons i mentioned before. For example is no the same to do an IPA pass when you only have to support X86 than when you have to deal with the quirks and specifics of 15+ plataforms/oses combinations while keeping 25 years of backward compatibility in your back.
6.) Gcc is really efficient as a compiler as we stated already but i admit from a developer perspective it lacks many eye candy in the output that clang offers that is really helpful but is not like i cannot develop without it either tho
7.) tool-free dev env? emacs? WTF!! LOL, to begin with there are like a zillion IDE/rad enviroments (kdevelop/qtcreator/netbeans/qtdevelop/adjunta/monkey studio, etc) without mention GDB, valgrind,etc,etc. BTW Genius visual studio is 2 separated softwares LOL aka first is a COMPILER and the other is a RAD/IDE enviroment (dependng the language ofc) and you can develop in VS using cmake and then compile using GCC or using plugins you can bypass VC compiler entirely and use ICC for example, so yes the clang compiler output is closer to VC compiler and the linux IDE/Rad(not dependant on GCC) miss some features compared to VS IDE/RAD but clang is not an IDE/RAD
8.) The parse tree in Clang is also more suitable for supporting automated code refactoring but is not like GCC can't and code refactoring is IDE/RAD job and mostly compiler/language independant.
9.) Gcc was started many years ago with the tech available for that age including all the massive grow it has since then and the fact that is somehow became almost an industry standard in many sectors is reasonable to expect it will become massive enough to be really hard making invasive changes but this is will happen to LLVM/CLang eventually too and any other software, look at apache for example sure nginx/lighthttp are awesome and apache is quite bloated if you ask me but apache is an industry standard so they can't just change stuff without massive care and years of warning so ppl can decide to upgrade beside they need to maintain previous version for many years too(many big enterprise software still require apache 1.3 for example)
Clang is a very nice project and in some years will be important and big enough to compete with GCC but for now GCC is the most powerfull/efficient OSS compiler and truth is the only compiler superior to it is ICC not for some benchys but true real world performance(again industry class software not some gnome applet) and no one is stopping you to use clang for development and compile your final builds with GCC to get the best of both worlds so not need to go all IANAL about it.
Comment
-
Originally posted by kiputnik View PostThe point is - he doesn't have to. Just run the benchmark twice - once for the -fprofile-generate binary, recompile with -fprofile-use, and run it again. He already has the necessary 'representative load' required.
I personally think of any optimizations that lie outside of the standard -On levels as rather 'exotic' and not necessarily part of a superficial benchmark as that of Phoronix OpenBenchmarking, I'm just happy if we no longer have insane -mtune=k8 tunings on intel cpu tests and no pointless -O0, -O1 optimization settings.
Originally posted by kiputnik View PostIt's a myth that you have to stress every path for PGO to be effective* - you just need to stress the most statistically significant path, and you'll most likely always get a speed boost.
* - for the vast majority of software
Comment
Comment