What were the flags used for Sun Studio btw? Those mp3 and ogg encoding results looks almost like SSE(2/3) wasn't enabled, or that Suns compiler didn't understand the SSE intrinsics used in the code.
(.. or there's just a bug in Sun Studio here)
I was thinking the same thing. It looks like LAME needs nasm to assemble it's asm sources. (libmp3lame/i386/*.nas). I don't know why gcc is fast, though. Maybe the nasm objects won't link with Sun-studio .o files? Maybe because of a -xarch=native/-xarch=native64 mixup?
Some projects have their asm optimizations in GNU-extension asm() statements, which they'd have to disable to build with Sun's compiler. Or anything else that needs GNU C is not going to work with Sun's compiler.
So if you do it right, you can get asm optimizations with Sun's Studio compiler on LAME. (Although that result says compiler: gcc 3.4.3, so maybe it was just a trial run trying to get the parameters right.) I didn't find anything really useful googling for "mp3 lame sun studio". Hmm, I did find something with "nasm lame sun studio": LAME 3.98.2 has a commit: "Disable MMX when using Sun Studio." Maybe that's because Studio optimizes the C to better MMX/SSE itself (probably only with -xvector=simd, unless that's on by default these days), or because something is broken. It's just a few lines added to the Makefile.
Unfortunately, the global.p-t-s.com results don't show compiler flags used or anything.
As others are saying: what compiler flags were used!! I have no idea what the results mean without seeing them. I don't even know which of the tests used multiple cores. That kind of matters, because if you have an 8 core machine, you usually plan on keeping at least some of the cores busy most of the time. So you can't just compile every program to bust out multiple threads, because what if you want to run multiple things at the same time?
I'm coming at this from an HPC cluster background, where we tended to have embarrasingly parallel workloads, so we'd use the same single-threaded program running on a hundred different input files. With grid engine, or just make -j 8 style parallelism. I guess a desktop would be different, and someone might conceivably buy a dual quad-core just so threaded apps could run fast, and not tend to have any number crunching jobs using up any CPUs most of the time.
BTW, flags you should use with Sun CC (unless these are outdated now): cc -fast -xarch=native64 -xvector=simd -xipo
-xarch=native64 Make a binary that doesn't waste time being compatible with anything but your machine (in 64bit mode).
-xipo : cross-file optimization by putting source analysis into .o files, so the optimizer can run at link time.
read the docs. You can use -xjobs=8 to let cc fork off worker jobs when it has a lot of work to do, e.g. at link time with -xipo, if I recall correctly.
Based on the correct compiler flags when using Sun Studio 12 on the tested AMD64 hardware (i.e. -fast -xarch=amd64a -xipo=2), we saw the
LAME test performance improve to beat the Ubuntu 8.10 scores (i.e. 38s-40s) mentioned in the article!
We believe that the OS-2008.11 and the newer OS 2008.11-b107 (i.e. which properly matches the Ubuntu 8.10 specs) and the use of Sun Studio 12 or Blastwave.org's GCC 4.3.3 port (see: http://blastwave.network.com/testing...386-CSW.pkg.gz) can match or beat all of the Ubuntu 8.10/9.10 benchmarks hands down.
I think this article is more fair than the earlier articles. But there are some complaints still.
For instance, why focus on compile time? If the resulting code is twice as slow but compiles 10 secs faster, is it good? UPDATE: see below.
Obviously it is difficult to do good benchmarks with compilers. Maybe SUN and GCC people should have given their input. But this is a better article I think. Thanks phoronix for listening and willing to try again!
UPDATE: As Ex-Cyber pointed it out, there is no focus on compile time. I take it back. In fact pointers on compile time can be important. I like this test better than the earlier ones.