One thing about comparison tables, it would've made more sense to arrange CPU results in order of increasing price from left to right for example, so that it is more practical and comfortable to compare the results. X3 then I7 then I5 then I7 does not make much sense.
Announcement
Collapse
No announcement yet.
Intel Core i5 750, Core i7 870 Linux Benchmarks
Collapse
X
-
AMD vs Intel on Linux vs Windows
Originally posted by AdrenalineJunky View Postwhen i saw they were comparing an X3 i rolled my eyes...
when i saw the results i nearly fell out of my chair in amazement.
WHAT? how does that even happen? in windows the i5 was obliterating the phenom 965....
So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.
There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
With my results using open64 on a i7-920 without overclocking(running ubuntu):
$ head -8 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
cpu MHz : 2672.704
cache size : 8192 KB
$ ./stream-open64
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
...
Function Rate (MB/s) Avg time Min time Max time
Copy: 22410.3334 0.0143 0.0143 0.0143
Scale: 22282.7187 0.0144 0.0144 0.0144
Add: 22511.9469 0.0230 0.0213 0.0234
Triad: 20943.1595 0.0233 0.0229 0.0234
For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
Function Rate (MB/s) Avg time Min time Max time
Copy: 12455.2457 0.0258 0.0257 0.0258
Scale: 12369.4995 0.0259 0.0259 0.0260
Add: 12539.4940 0.0384 0.0383 0.0387
Triad: 12442.0063 0.0387 0.0386 0.0387
Using gcc-4.4.1 on i7-920:
$ ./stream-gcc-4.4.1
Function Rate (MB/s) Avg time Min time Max time
Copy: 14374.3618 0.0223 0.0223 0.0224
Scale: 14416.3573 0.0222 0.0222 0.0223
Add: 15624.5172 0.0308 0.0307 0.0308
Triad: 15801.4749 0.0304 0.0304 0.0
Same on X4 810:
Function Rate (MB/s) Avg time Min time Max time
Copy: 8490.9773 0.0378 0.0377 0.0380
Scale: 8485.1263 0.0379 0.0377 0.0383
Add: 9569.1637 0.0503 0.0502 0.0508
Triad: 9573.1679 0.0505 0.0501 0.0528
For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
Function Rate (MB/s) Avg time Min time Max time
Copy: 8326.3684 0.0386 0.0384 0.0389
Scale: 8329.6239 0.0386 0.0384 0.0389
Add: 9358.3690 0.0514 0.0513 0.0517
Triad: 9346.5083 0.0514 0.0514 0.0518
So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.
Comment
-
Originally posted by BillBroadley View PostMy best guess is that the compilers used under windows are more aware of the special features of the newest chips. Things like the double wide SSE, increased number of micro-ops possible, and various other optimizations. That and the combination of turning off the auto overclocking (mentioned in some detail in the p55 article) combine to show AMD with a substantial advantage.
So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.
There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
With my results using open64 on a i7-920 without overclocking(running ubuntu):
$ head -8 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
cpu MHz : 2672.704
cache size : 8192 KB
$ ./stream-open64
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
...
Function Rate (MB/s) Avg time Min time Max time
Copy: 22410.3334 0.0143 0.0143 0.0143
Scale: 22282.7187 0.0144 0.0144 0.0144
Add: 22511.9469 0.0230 0.0213 0.0234
Triad: 20943.1595 0.0233 0.0229 0.0234
For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
Function Rate (MB/s) Avg time Min time Max time
Copy: 12455.2457 0.0258 0.0257 0.0258
Scale: 12369.4995 0.0259 0.0259 0.0260
Add: 12539.4940 0.0384 0.0383 0.0387
Triad: 12442.0063 0.0387 0.0386 0.0387
Using gcc-4.4.1 on i7-920:
$ ./stream-gcc-4.4.1
Function Rate (MB/s) Avg time Min time Max time
Copy: 14374.3618 0.0223 0.0223 0.0224
Scale: 14416.3573 0.0222 0.0222 0.0223
Add: 15624.5172 0.0308 0.0307 0.0308
Triad: 15801.4749 0.0304 0.0304 0.0
Same on X4 810:
Function Rate (MB/s) Avg time Min time Max time
Copy: 8490.9773 0.0378 0.0377 0.0380
Scale: 8485.1263 0.0379 0.0377 0.0383
Add: 9569.1637 0.0503 0.0502 0.0508
Triad: 9573.1679 0.0505 0.0501 0.0528
For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
Function Rate (MB/s) Avg time Min time Max time
Copy: 8326.3684 0.0386 0.0384 0.0389
Scale: 8329.6239 0.0386 0.0384 0.0389
Add: 9358.3690 0.0514 0.0513 0.0517
Triad: 9346.5083 0.0514 0.0514 0.0518
So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.
Comment
-
Originally posted by Kano View PostThe i7 750 can be faster than the i7 920 because the i5 can use 5 levels of turbo boost and the i7 only 3. Each step is 133 mhz, some are only available with 1 or 2 cores. A benchmark like povray which only runs on 1 core should show it.
Comment
-
Originally posted by BillBroadley View Post
With my results using open64 on a i7-920 without overclocking(running ubuntu):
$ head -8 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 4
cpu MHz : 2672.704
cache size : 8192 KB
$ ./stream-open64
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
...
Function Rate (MB/s) Avg time Min time Max time
Copy: 22410.3334 0.0143 0.0143 0.0143
Scale: 22282.7187 0.0144 0.0144 0.0144
Add: 22511.9469 0.0230 0.0213 0.0234
Triad: 20943.1595 0.0233 0.0229 0.0234
Update: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.
PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7
ICC 11.1
Code:Function Rate (MB/s) Avg time Min time Max time Copy: 13223.4215 0.0026 0.0024 0.0150 Scale: 13261.3109 0.0025 0.0024 0.0090 Add: 13726.5011 0.0036 0.0035 0.0048 Triad: 13788.5482 0.0036 0.0035 0.0050
Code:Copy: 8859.2560 0.0036 0.0036 0.0037 Scale: 8712.0426 0.0037 0.0037 0.0037 Add: 9541.0925 0.0050 0.0050 0.0051 Triad: 9749.9439 0.0056 0.0049 0.0111
Code:Copy: 8820.2489 0.0041 0.0036 0.0055 Scale: 8544.5460 0.0041 0.0037 0.0054 Add: 9597.9497 0.0053 0.0050 0.0055 Triad: 9632.8513 0.0059 0.0050 0.0100
ICC 11.1
Code:Function Rate (MB/s) Avg time Min time Max time Copy: 15116.3113 0.0023 0.0021 0.0089 Scale: 15794.0372 0.0021 0.0020 0.0029 Add: 15185.2913 0.0032 0.0032 0.0035 Triad: 15320.4925 0.0032 0.0031 0.0050
Code:Function Rate (MB/s) Avg time Min time Max time Copy: 9624.1021 0.0054 0.0033 0.0216 Scale: 9329.0977 0.0035 0.0034 0.0035 Add: 10278.5823 0.0047 0.0047 0.0047 Triad: 10478.1197 0.0046 0.0046 0.0047
Code:Function Rate (MB/s) Avg time Min time Max time Copy: 9445.3011 0.0034 0.0034 0.0034 Scale: 9297.4320 0.0035 0.0034 0.0035 Add: 10225.8529 0.0047 0.0047 0.0048 Triad: 10405.5505 0.0046 0.0046 0.0047
Last edited by justapost; 09 September 2009, 06:06 AM.
Comment
-
Originally posted by justapost View PostThose are triple channel results right?
Originally posted by justapost View PostMax possible with dual channel ddr3 1333 would be 21200MB/s. I'd be interested in dual channel results, I expect something around 17GB/s.
Originally posted by justapost View PostUpdate: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.
PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7
ICC 11.1
Code:Function Rate (MB/s) Avg time Min time Max time Copy: 13223.4215 0.0026 0.0024 0.0150 Scale: 13261.3109 0.0025 0.0024 0.0090 Add: 13726.5011 0.0036 0.0035 0.0048 Triad: 13788.5482 0.0036 0.0035 0.0050
Code:Copy: 8859.2560 0.0036 0.0036 0.0037 Scale: 8712.0426 0.0037 0.0037 0.0037 Add: 9541.0925 0.0050 0.0050 0.0051 Triad: 9749.9439 0.0056 0.0049 0.0111
Code:gcc -O4 -fopenmp stream.c -o s-gcc-4.3.3 -static export PATH=/opt/pkg/gcc-4.4.1/bin:$PATH gcc -O4 -fopenmp stream.c -o s-gcc-4.4.1 -static export PATH=/opt/pkg/x86_open64-4.2.2.1/bin:$PATH opencc -O4 -fopenmp stream.c -o s-open64-4.2.2.1 -static
Code:$ ./s-gcc-4.3.3 | grep Copy: Copy: 8500.2266 0.0377 0.0376 0.0377 $ ./s-gcc-4.4.1 | grep Copy: Copy: 8492.3205 0.0377 0.0377 0.0378 $ ./s-open64-4.2.2.1 | grep Copy: Copy: 12487.2286 0.0258 0.0256 0.0258
Comment
-
What is important to keep in mind though is that Intel Turbo Boost Technology was disabled on the processors during testing, since this functionality had not worked under Linux for increasing the clock frequency but instead appeared to cause some sporadic performance problems.Update: after starting to see a flow of Windows-based reviews today, it looks like there are some more serious Linux + Lynnfield problems at hand, which we are currently investigating.
Is this only a problem for the new Lynnfield (i5 750, i7 860 & 870) series, or do the core i7-9xx cpu's also share the same issues? I'm asking since the Turboboost feature underwent some changes between these cpus.
And what are the other linux+lynnfield problems you mention? Is it just the lm_sensors package that can't read the temps, or are there other problems at hand here?
Like many others, I planned on buying a Lynnfield core i7-860 soon, but if even an AMD triple core performs better in many tests for more than half the money, I'd want answers to above questions before making that decision.
Thanks for the benchmarks, but now i'm a bit disappointed with these indications for bad linux performance.
Comment
-
Originally posted by BillBroadley View PostYes, triple channel.
Alas my machine is in production, so I can't easily try dual channel. I've seen dual vs tri channel stream numbers posted in a hardware review, alas I can't remember where.
Originally posted by BillBroadley View PostHrm, try this:
Code:opencc [B]-O4[/B] -fopenmp stream.c -o s-open64-4.2.2.1 -static
Code:Copy: 14486.5330 0.0025 0.0022 0.0031 Scale: 14246.6541 0.0026 0.0022 0.0046 Add: 14022.8872 0.0036 0.0034 0.0042 Triad: 14011.1763 0.0045 0.0034 0.0107
Comment
Comment