Announcement

Collapse
No announcement yet.

Intel Core i5 750, Core i7 870 Linux Benchmarks

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • BillBroadley
    replied
    Originally posted by justapost View Post
    Those are triple channel results right?
    Yes, triple channel.

    Originally posted by justapost View Post
    Max possible with dual channel ddr3 1333 would be 21200MB/s. I'd be interested in dual channel results, I expect something around 17GB/s.
    Alas my machine is in production, so I can't easily try dual channel. I've seen dual vs tri channel stream numbers posted in a hardware review, alas I can't remember where.

    Originally posted by justapost View Post
    Update: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.

    PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7

    ICC 11.1
    Code:
    Function      Rate (MB/s)   Avg time     Min time     Max time                                                         
    Copy:       13223.4215       0.0026       0.0024       0.0150                                                          
    Scale:      13261.3109       0.0025       0.0024       0.0090                                                          
    Add:        13726.5011       0.0036       0.0035       0.0048                                                          
    Triad:      13788.5482       0.0036       0.0035       0.0050
    Open64 4.2.1
    Code:
    Copy:        8859.2560       0.0036       0.0036       0.0037           
    Scale:       8712.0426       0.0037       0.0037       0.0037           
    Add:         9541.0925       0.0050       0.0050       0.0051           
    Triad:       9749.9439       0.0056       0.0049       0.0111
    I used openCC -fopenmp -O2 -o stream-o64 stream.c to build the open64 version. Seems I'm doing something wrong here.
    Hrm, try this:
    Code:
    gcc -O4 -fopenmp stream.c -o s-gcc-4.3.3 -static 
    export PATH=/opt/pkg/gcc-4.4.1/bin:$PATH
    gcc -O4 -fopenmp stream.c -o s-gcc-4.4.1 -static 
    export PATH=/opt/pkg/x86_open64-4.2.2.1/bin:$PATH
    opencc -O4 -fopenmp stream.c -o s-open64-4.2.2.1 -static
    Hopefully it will produce numbers like:
    Code:
    $ ./s-gcc-4.3.3  | grep Copy:
    Copy:        8500.2266       0.0377       0.0376       0.0377
    $ ./s-gcc-4.4.1  | grep Copy:
    Copy:        8492.3205       0.0377       0.0377       0.0378
    $ ./s-open64-4.2.2.1  | grep Copy:
    Copy:       12487.2286       0.0258       0.0256       0.0258

    Leave a comment:


  • justapost
    replied
    Originally posted by BillBroadley View Post

    With my results using open64 on a i7-920 without overclocking(running ubuntu):

    $ head -8 /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 26
    model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    stepping : 4
    cpu MHz : 2672.704
    cache size : 8192 KB

    $ ./stream-open64
    -------------------------------------------------------------
    STREAM version $Revision: 5.9 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 20000000, Offset = 0
    Total memory required = 457.8 MB.
    ...
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 22410.3334 0.0143 0.0143 0.0143
    Scale: 22282.7187 0.0144 0.0144 0.0144
    Add: 22511.9469 0.0230 0.0213 0.0234
    Triad: 20943.1595 0.0233 0.0229 0.0234
    Those are triple channel results right? Max possible with dual channel ddr3 1333 would be 21200MB/s. I'd be interested in dual channel results, I expect something around 17GB/s. On an amd setup you must run the nb at around 2.6GHz and the cpu abit faster 2.8GHz to hit the max which was around 15.7GB/s here with DDR3 1333 CL7 (stream build with icc 11.1 and openmp support).

    Update: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.

    PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7

    ICC 11.1
    Code:
    Function      Rate (MB/s)   Avg time     Min time     Max time
    Copy:       13223.4215       0.0026       0.0024       0.0150
    Scale:      13261.3109       0.0025       0.0024       0.0090
    Add:        13726.5011       0.0036       0.0035       0.0048
    Triad:      13788.5482       0.0036       0.0035       0.0050
    Open64 4.2.1
    Code:
    Copy:        8859.2560       0.0036       0.0036       0.0037
    Scale:       8712.0426       0.0037       0.0037       0.0037
    Add:         9541.0925       0.0050       0.0050       0.0051
    Triad:       9749.9439       0.0056       0.0049       0.0111
    GCC-4.3.1
    Code:
    Copy:        8820.2489       0.0041       0.0036       0.0055
    Scale:       8544.5460       0.0041       0.0037       0.0054
    Add:         9597.9497       0.0053       0.0050       0.0055
    Triad:       9632.8513       0.0059       0.0050       0.0100
    PII 955BE 3.2GHz NB 2.6GHz MEM 2xDDR1333 Unganged CL7


    ICC 11.1
    Code:
    Function      Rate (MB/s)   Avg time     Min time     Max time
    Copy:       15116.3113       0.0023       0.0021       0.0089
    Scale:      15794.0372       0.0021       0.0020       0.0029
    Add:        15185.2913       0.0032       0.0032       0.0035
    Triad:      15320.4925       0.0032       0.0031       0.0050
    Open64 4.2.1
    Code:
    Function      Rate (MB/s)   Avg time     Min time     Max time
    Copy:        9624.1021       0.0054       0.0033       0.0216
    Scale:       9329.0977       0.0035       0.0034       0.0035
    Add:        10278.5823       0.0047       0.0047       0.0047
    Triad:      10478.1197       0.0046       0.0046       0.0047
    GCC-4.3.1
    Code:
    Function      Rate (MB/s)   Avg time     Min time     Max time
    Copy:        9445.3011       0.0034       0.0034       0.0034
    Scale:       9297.4320       0.0035       0.0034       0.0035
    Add:        10225.8529       0.0047       0.0047       0.0048
    Triad:      10405.5505       0.0046       0.0046       0.0047
    I used openCC -fopenmp -O2 -o stream-o64 stream.c to build the open64 version. Seems I'm doing something wrong here.
    Last edited by justapost; 09-09-2009, 06:06 AM.

    Leave a comment:


  • AdrenalineJunky
    replied
    Originally posted by Kano View Post
    The i7 750 can be faster than the i7 920 because the i5 can use 5 levels of turbo boost and the i7 only 3. Each step is 133 mhz, some are only available with 1 or 2 cores. A benchmark like povray which only runs on 1 core should show it.
    except that in this test turbo boost was disabled.

    Leave a comment:


  • Kano
    replied
    The i7 750 can be faster than the i7 920 because the i5 can use 5 levels of turbo boost and the i7 only 3. Each step is 133 mhz, some are only available with 1 or 2 cores. A benchmark like povray which only runs on 1 core should show it.

    Leave a comment:


  • deanjo
    replied
    Originally posted by BillBroadley View Post
    My best guess is that the compilers used under windows are more aware of the special features of the newest chips. Things like the double wide SSE, increased number of micro-ops possible, and various other optimizations. That and the combination of turning off the auto overclocking (mentioned in some detail in the p55 article) combine to show AMD with a substantial advantage.

    So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.

    There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
    http://www.phoronix.com/scan.php?pag...nnfield&num=13

    With my results using open64 on a i7-920 without overclocking(running ubuntu):

    $ head -8 /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 26
    model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    stepping : 4
    cpu MHz : 2672.704
    cache size : 8192 KB

    $ ./stream-open64
    -------------------------------------------------------------
    STREAM version $Revision: 5.9 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 20000000, Offset = 0
    Total memory required = 457.8 MB.
    ...
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 22410.3334 0.0143 0.0143 0.0143
    Scale: 22282.7187 0.0144 0.0144 0.0144
    Add: 22511.9469 0.0230 0.0213 0.0234
    Triad: 20943.1595 0.0233 0.0229 0.0234

    For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 12455.2457 0.0258 0.0257 0.0258
    Scale: 12369.4995 0.0259 0.0259 0.0260
    Add: 12539.4940 0.0384 0.0383 0.0387
    Triad: 12442.0063 0.0387 0.0386 0.0387

    Using gcc-4.4.1 on i7-920:
    $ ./stream-gcc-4.4.1
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 14374.3618 0.0223 0.0223 0.0224
    Scale: 14416.3573 0.0222 0.0222 0.0223
    Add: 15624.5172 0.0308 0.0307 0.0308
    Triad: 15801.4749 0.0304 0.0304 0.0

    Same on X4 810:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8490.9773 0.0378 0.0377 0.0380
    Scale: 8485.1263 0.0379 0.0377 0.0383
    Add: 9569.1637 0.0503 0.0502 0.0508
    Triad: 9573.1679 0.0505 0.0501 0.0528

    For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8326.3684 0.0386 0.0384 0.0389
    Scale: 8329.6239 0.0386 0.0384 0.0389
    Add: 9358.3690 0.0514 0.0513 0.0517
    Triad: 9346.5083 0.0514 0.0514 0.0518

    So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.
    Should be noted that Pathscale maybe no longer available in the near future. It all depends what happens now that Cray has bought it.

    Leave a comment:


  • BillBroadley
    replied
    AMD vs Intel on Linux vs Windows

    Originally posted by AdrenalineJunky View Post
    when i saw they were comparing an X3 i rolled my eyes...

    when i saw the results i nearly fell out of my chair in amazement.

    WHAT? how does that even happen? in windows the i5 was obliterating the phenom 965....
    My best guess is that the compilers used under windows are more aware of the special features of the newest chips. Things like the double wide SSE, increased number of micro-ops possible, and various other optimizations. That and the combination of turning off the auto overclocking (mentioned in some detail in the p55 article) combine to show AMD with a substantial advantage.

    So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.

    There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
    http://www.phoronix.com/scan.php?pag...nnfield&num=13

    With my results using open64 on a i7-920 without overclocking(running ubuntu):

    $ head -8 /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 26
    model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
    stepping : 4
    cpu MHz : 2672.704
    cache size : 8192 KB

    $ ./stream-open64
    -------------------------------------------------------------
    STREAM version $Revision: 5.9 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 20000000, Offset = 0
    Total memory required = 457.8 MB.
    ...
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 22410.3334 0.0143 0.0143 0.0143
    Scale: 22282.7187 0.0144 0.0144 0.0144
    Add: 22511.9469 0.0230 0.0213 0.0234
    Triad: 20943.1595 0.0233 0.0229 0.0234

    For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 12455.2457 0.0258 0.0257 0.0258
    Scale: 12369.4995 0.0259 0.0259 0.0260
    Add: 12539.4940 0.0384 0.0383 0.0387
    Triad: 12442.0063 0.0387 0.0386 0.0387

    Using gcc-4.4.1 on i7-920:
    $ ./stream-gcc-4.4.1
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 14374.3618 0.0223 0.0223 0.0224
    Scale: 14416.3573 0.0222 0.0222 0.0223
    Add: 15624.5172 0.0308 0.0307 0.0308
    Triad: 15801.4749 0.0304 0.0304 0.0

    Same on X4 810:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8490.9773 0.0378 0.0377 0.0380
    Scale: 8485.1263 0.0379 0.0377 0.0383
    Add: 9569.1637 0.0503 0.0502 0.0508
    Triad: 9573.1679 0.0505 0.0501 0.0528

    For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8326.3684 0.0386 0.0384 0.0389
    Scale: 8329.6239 0.0386 0.0384 0.0389
    Add: 9358.3690 0.0514 0.0513 0.0517
    Triad: 9346.5083 0.0514 0.0514 0.0518

    So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.

    Leave a comment:


  • AdrenalineJunky
    replied
    when i saw they were comparing an X3 i rolled my eyes...

    when i saw the results i nearly fell out of my chair in amazement.

    WHAT? how does that even happen? in windows the i5 was obliterating the phenom 965....

    Leave a comment:


  • kotsergy
    replied
    One thing about comparison tables, it would've made more sense to arrange CPU results in order of increasing price from left to right for example, so that it is more practical and comfortable to compare the results. X3 then I7 then I5 then I7 does not make much sense.

    Leave a comment:


  • Apopas
    replied
    Originally posted by BillBroadley View Post
    It's worth nothing that Microsoft licensed the intel compiler technology to enable more aggressive optimizations.

    To achieve similar performance numbers under linux you need to run Pathscale, Portland group, or the free open64 compilers to get similar numbers. For instance using said compilers gets over double the performance with stream when compared to the phoronix posted numbers.
    Is possible then, the Intel's proccessors to run better the programs that have been compiled with their own compiler?

    Leave a comment:


  • BillBroadley
    replied
    Windows compiler

    Originally posted by deanjo View Post
    That would have to be the MSVS series one would have to assume.
    It's worth nothing that Microsoft licensed the intel compiler technology to enable more aggressive optimizations.

    To achieve similar performance numbers under linux you need to run Pathscale, Portland group, or the free open64 compilers to get similar numbers. For instance using said compilers gets over double the performance with stream when compared to the phoronix posted numbers.

    Leave a comment:

Working...
X