Announcement

**Eduard Munteanu** · 31 January 2011, 02:38 PM

Originally posted by mtippett View Post

We agree to disagree. Michael and I have had discussions with a commercial compiler vendor about the defaults and compiler structure on their product. They accepted that the default is a critical entry bar for people, tuning for peak performance comes as a second order.

Most people do not go and tune all 5 compilers to maximum performance. They choose the one that is in the order of what they want to see and then tune from there.

I'm sorry, but that's misguided. This is not about tuning. This is not how people use those compilers, or at least gcc. Distro packages aren't built with gcc in its default configuration. I could go on forever.

The point here is supplying optimization-related compiler flags is normal practice for gcc and it's very much required to get any sensibly-performing code. We could argue whether to use '-O2' or '-O3', but choosing the "standard configuration" is still a choice and it's by no means objective or anything, moreso since it's not even common practice for either packagers or package maintainers.

What if gcc people suddenly thought "hey let's add -fomg-awesome-optimization by default", which had benefits but meant code took 20x as long to compile? What if they made gcc create executables with huge amounts of debugging information? Would you have tested those configurations?

Now we can argue about defaults, but it's really common knowledge what you should and shouldn't use for building packages, and what you can expect from not passing any additional options. Quoting the manpage: "-O0 Reduce compilation time and make debugging produce the expected results. This is the default.". Furthermore, a compiler isn't stuff that targets the regular Joe as its audience.

I propose including measurements about the compiling time as well. That could dispel some concerns that some compiler option might be unfair (e.g. overly aggressive, brutish optimizations).

But apart from that, it's probably best to look around and ask packagers and applications maintainers what flags they recommend to be used on a global scale. You'll see the answer isn't far from the simple "-O2" in case of gcc. I'm sure something like that can be decided for the other compilers.

**Eduard Munteanu** · 31 January 2011, 02:49 PM

That being said, I apologize if I misunderstood what you actually did there. My understanding is somehow you made sure no optimizations are passed in, but that probably takes a bit of twiddling with the building infrastructure in some cases to accomplish, so it seems unlikely :/

**mtippett** · 31 January 2011, 02:53 PM

Originally posted by Eduard Munteanu View Post

That being said, I apologize if I misunderstood what you actually did there. My understanding is somehow you made sure no optimizations are passed in, but that probably takes a bit of twiddling with the building infrastructure in some cases to accomplish, so it seems unlikely :/

I'm currently investigating the configuration for some of the benchmarks. For the C-Ray benchmark, the default CFLAGS are -O3 -ffast-math.

The default goes across the entire environment - the benchmark default compile options and the compilers default response.

**Eduard Munteanu** · 31 January 2011, 02:55 PM

Originally posted by mtippett View Post

I'm currently investigating the configuration for some of the benchmarks. For the C-Ray benchmark, the default CFLAGS are -O3 -ffast-math.

The default goes across the entire environment - the benchmark default compile options and the compilers default response.

Oh dear, I kinda misread what you did, I'm sorry: "All compilers were tested in their "out of the box" configuration without specifying any extra flags.". Which is fine, given many packages already supply sane values there.

**ChrisXY** · 31 January 2011, 05:28 PM

Originally posted by Eduard Munteanu View Post

What if gcc people suddenly thought "hey let's add -fomg-awesome-optimization by default", which had benefits but meant code took 20x as long to compile? What if they made gcc create executables with huge amounts of debugging information? Would you have tested those configurations?

What if I do a fork of gcc and do nothing else than change the default configuration to the best optimization? I will be way before stock gcc in any Benchmark on Phoronix!!

**staalmannen** · 31 January 2011, 05:48 PM

[QUOTE=mtippett;171679]

Originally posted by staalmannen View Post

I agree that there is a risk that you may run into local minima (maxima) there are probably lots of them. However, for most people they are looking for an ROI scenario and rely on the upstream experts (distribution/compiler developers/etc..) to do the majority of the tuning.

In your particular case, I applaud the result set that you are building. But as you can see in the result set, there are some workloads that show no meaningful difference between the optimization level (gcc-64; bullet physics; 1000 convex) or degradation in the high optimizations (open64; bullet; 1000 convex).

Ultimately it comes down to the workload that you want to use and examining that carefully.

Well... my vision is not that everyone should do ALL the tests themselves in the future. If anything, this is my greatest hopes (and my greatest current complaints) about PTS - Meta analysis. Within my field of research (molecular biology) massive amounts of data are available from various experiments. The interesting thing with all that data, which was produced for a particular purpose, is that it can be re-used by independent analyses.

I like to liken this with the indian tale of a number of blind men describing an elephant, and depending on which side of the elephant they are standing they are giving quite different descriptions of the elephant.
My extension of this analogy is that yet another blind man is sitting some distance away and listen to the various descriptions. Not only does he get all the different views of the elephant (the sum of the description) but also based on the direction of the sound from the one one giving the description gives data, which he can use to triangulate out how the different descriptions fit together.

In order to do efficient meta-analysis though and to make optimal use of a massive database of various benchmarking results, there need to be clear annotations of all the different variables for that particular run. In molecular biology, there are various standards (MIAME for microarray, GeneOntology for gene functions etc...), which are very useful when meta-analyses are made.

With a sufficiently large dataset, I imagine that one could do virtual optimizations in a similar way, identifying the most appropriate conditions for your particular use case.

**mat69** · 31 January 2011, 06:15 PM

Originally posted by Eduard Munteanu View Post

Oh dear, I kinda misread what you did, I'm sorry: "All compilers were tested in their "out of the box" configuration without specifying any extra flags.". Which is fine, given many packages already supply sane values there.

Well that depends really if they do that for all compilers. E.g. if they specifically set some flags for some compilers while totally ignoring other compilers.
This way you partially compare the work of the devs.
But not of the devs of the compilers, but rather of the people writing the Makefiles.

There is simply not enough information present in the article to come to any meaningful conclusion. E.g. any compiler option set -- be it manually or by a Makefile -- have to be named etc.

Thus the article is hardly of any use.

**qneill** · 31 January 2011, 06:33 PM

Originally posted by mat69 View Post

...
Thus the article is hardly of any use.

I wouldn't go as far as to say "hardly of any use" but I think the flags (or a link to all details like flags, benchmark configurations, hardware details) would enhance this report.

At the very least having a record of variables held steady vs. changed between runs (in this report, and future runs as well) should be a requirement.

**XorEaxEax** · 31 January 2011, 07:30 PM

Well, the only real solution I can see is to use the highest setting (which is SUPPOSEDLY the one to give the best optimization at expense of comile time) across all compiler tests. This would mean -O3, also given that some compilers enable -ffast-math by default on -O3 and some doesn't, explicitly setting this flag would make sense since it can have a big influence on the results. In real life -O2 sometimes beats -O3, but that is (hopefully) an anomaly, and given that all compilers *should* strive for having -O3 producing the fastest code then I think -O3 with -ffast-math and -march=native would be a good fit for compilers that supports it.

As for the tests in question, I'm very impressed with Open64's results in some of these tests, but just like with the tests here, my own show it outperforms other compilers by a large margin in some tests and then lose out quite big (mostly in larger projects, like Mame) to same compilers in other tests.

Anyway I did some quick tests on c-ray-mt (multithreaded) myself, using ./c-ray-mt -t 32 -s 8000x4000 -i scene -o foo.ppm

here are the results on the compilers I tested:

Open64 (-O3 -ffast-math)
5469 milliseconds

Open64 (-O2 -ffast-math)
6605 milliseconds

Open64 (-Ofast)
failed to complete test

Open64 (-O3 -ffast-math -fb-create/fb-opt) Profiled build
5467 milliseconds

GCC 4.5.2 (-march=native -O3 -ffast-math)
5894 milliseconds

GCC 4.5.2 (-march=native -O2 -ffast-math)
5870 milliseconds

GCC 4.5.2 (-march=native -O3 -ffast-math -fprofile-generate/-fprofile-use) profiled build
5861 milliseconds

GCC 4.6 (20110122 snapshot) (-march=native -O3 -ffast-math)
5880 milliseconds

GCC 4.6 (20110122 snapshot) (-march=native -O2 -ffast-math)
5817 milliseconds

GCC 4.6 (-march=native -O3 -ffast-math -fprofile-generate/-fprofile-use) profiled build
5826 milliseconds

Clang (latest svn) (-march=native -O3 -ffast-math)
7846 milliseconds

Clang (latest svn) (-march=native -O2 -ffast-math)
7769 milliseconds

Clang 2.8-4 (-march=native -O3 -ffast-math)
6597 milliseconds

Clang 2.8-4 (-march=native -O2 -ffast-math)
6581 milliseconds

Here open64 shines, beating the rest quite handidly. Also notable that it was the only compiler in which -O3 created a faster binary than -O2 (and by a wide margin). PGO (profile guided optimization) had very little impact in this test. I didn't bother with LTO since the program is a single file and thus I don't see link time optimization making a difference here. Oddly Open64 with -Ofast compiled well but didn't finish running the test, I tried Open64 with -O3 -ffast-math -ipa (ipa = interprocedural optimizations) but it came out slightly slower than simple -O3 -ffast-math.

Anyway, open64 certainly is an interesting compiler to watch, in some tests I've done it REALLY outperforms the other compilers, those AMD guys could be up to something interesting here.

Oh and all tests were done on a Core i5 clocked at 3.2ghz.

**smitty3268** · 01 February 2011, 12:05 AM

Originally posted by mtippett View Post

I'm currently investigating the configuration for some of the benchmarks. For the C-Ray benchmark, the default CFLAGS are -O3 -ffast-math.

The default goes across the entire environment - the benchmark default compile options and the compilers default response.

Reading this whole thread, the fact that default compile options such as -O3 are used was not clear to me at all. Perhaps next time michael can mention that directly in the article, where he talks about what options are being used. Because the others are correct that if you use the default compiler options (-O0) it's useless unless you're just timing how long the compilation takes. But if you're using application supplied make files and compiling in release mode, then i'm sure they have some generic optimization flags included, which is fine and all that's needed. Although it might be worth investigating which flags are used with which compilers - in other words, are some compilers being hurt in the test because GCC is using -O3 and Open64 doesn't support that flag?

Announcement

A Linux Compiler Deathmatch

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment