Announcement

Collapse
No announcement yet.

Intel Core i5 750, Core i7 870 Linux Benchmarks

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    One thing about comparison tables, it would've made more sense to arrange CPU results in order of increasing price from left to right for example, so that it is more practical and comfortable to compare the results. X3 then I7 then I5 then I7 does not make much sense.

    Comment


    • #32
      when i saw they were comparing an X3 i rolled my eyes...

      when i saw the results i nearly fell out of my chair in amazement.

      WHAT? how does that even happen? in windows the i5 was obliterating the phenom 965....

      Comment


      • #33
        AMD vs Intel on Linux vs Windows

        Originally posted by AdrenalineJunky View Post
        when i saw they were comparing an X3 i rolled my eyes...

        when i saw the results i nearly fell out of my chair in amazement.

        WHAT? how does that even happen? in windows the i5 was obliterating the phenom 965....
        My best guess is that the compilers used under windows are more aware of the special features of the newest chips. Things like the double wide SSE, increased number of micro-ops possible, and various other optimizations. That and the combination of turning off the auto overclocking (mentioned in some detail in the p55 article) combine to show AMD with a substantial advantage.

        So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.

        There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
        http://www.phoronix.com/scan.php?pag...nnfield&num=13

        With my results using open64 on a i7-920 without overclocking(running ubuntu):

        $ head -8 /proc/cpuinfo
        processor : 0
        vendor_id : GenuineIntel
        cpu family : 6
        model : 26
        model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
        stepping : 4
        cpu MHz : 2672.704
        cache size : 8192 KB

        $ ./stream-open64
        -------------------------------------------------------------
        STREAM version $Revision: 5.9 $
        -------------------------------------------------------------
        This system uses 8 bytes per DOUBLE PRECISION word.
        -------------------------------------------------------------
        Array size = 20000000, Offset = 0
        Total memory required = 457.8 MB.
        ...
        Function Rate (MB/s) Avg time Min time Max time
        Copy: 22410.3334 0.0143 0.0143 0.0143
        Scale: 22282.7187 0.0144 0.0144 0.0144
        Add: 22511.9469 0.0230 0.0213 0.0234
        Triad: 20943.1595 0.0233 0.0229 0.0234

        For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
        Function Rate (MB/s) Avg time Min time Max time
        Copy: 12455.2457 0.0258 0.0257 0.0258
        Scale: 12369.4995 0.0259 0.0259 0.0260
        Add: 12539.4940 0.0384 0.0383 0.0387
        Triad: 12442.0063 0.0387 0.0386 0.0387

        Using gcc-4.4.1 on i7-920:
        $ ./stream-gcc-4.4.1
        Function Rate (MB/s) Avg time Min time Max time
        Copy: 14374.3618 0.0223 0.0223 0.0224
        Scale: 14416.3573 0.0222 0.0222 0.0223
        Add: 15624.5172 0.0308 0.0307 0.0308
        Triad: 15801.4749 0.0304 0.0304 0.0

        Same on X4 810:
        Function Rate (MB/s) Avg time Min time Max time
        Copy: 8490.9773 0.0378 0.0377 0.0380
        Scale: 8485.1263 0.0379 0.0377 0.0383
        Add: 9569.1637 0.0503 0.0502 0.0508
        Triad: 9573.1679 0.0505 0.0501 0.0528

        For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
        Function Rate (MB/s) Avg time Min time Max time
        Copy: 8326.3684 0.0386 0.0384 0.0389
        Scale: 8329.6239 0.0386 0.0384 0.0389
        Add: 9358.3690 0.0514 0.0513 0.0517
        Triad: 9346.5083 0.0514 0.0514 0.0518

        So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.

        Comment


        • #34
          Originally posted by BillBroadley View Post
          My best guess is that the compilers used under windows are more aware of the special features of the newest chips. Things like the double wide SSE, increased number of micro-ops possible, and various other optimizations. That and the combination of turning off the auto overclocking (mentioned in some detail in the p55 article) combine to show AMD with a substantial advantage.

          So while this advantage is real *today*, with problems with intel's turbo boost, and compiling binaries with the default gcc.

          There are compilers aware of such features. Portland Group, Pathscale, and the free open64. As an example compare the i7-920 results at:
          http://www.phoronix.com/scan.php?pag...nnfield&num=13

          With my results using open64 on a i7-920 without overclocking(running ubuntu):

          $ head -8 /proc/cpuinfo
          processor : 0
          vendor_id : GenuineIntel
          cpu family : 6
          model : 26
          model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
          stepping : 4
          cpu MHz : 2672.704
          cache size : 8192 KB

          $ ./stream-open64
          -------------------------------------------------------------
          STREAM version $Revision: 5.9 $
          -------------------------------------------------------------
          This system uses 8 bytes per DOUBLE PRECISION word.
          -------------------------------------------------------------
          Array size = 20000000, Offset = 0
          Total memory required = 457.8 MB.
          ...
          Function Rate (MB/s) Avg time Min time Max time
          Copy: 22410.3334 0.0143 0.0143 0.0143
          Scale: 22282.7187 0.0144 0.0144 0.0144
          Add: 22511.9469 0.0230 0.0213 0.0234
          Triad: 20943.1595 0.0233 0.0229 0.0234

          For comparison a Phenom II X4 810 (2.6 GHz) + DDR3-1333 using the same binary:
          Function Rate (MB/s) Avg time Min time Max time
          Copy: 12455.2457 0.0258 0.0257 0.0258
          Scale: 12369.4995 0.0259 0.0259 0.0260
          Add: 12539.4940 0.0384 0.0383 0.0387
          Triad: 12442.0063 0.0387 0.0386 0.0387

          Using gcc-4.4.1 on i7-920:
          $ ./stream-gcc-4.4.1
          Function Rate (MB/s) Avg time Min time Max time
          Copy: 14374.3618 0.0223 0.0223 0.0224
          Scale: 14416.3573 0.0222 0.0222 0.0223
          Add: 15624.5172 0.0308 0.0307 0.0308
          Triad: 15801.4749 0.0304 0.0304 0.0

          Same on X4 810:
          Function Rate (MB/s) Avg time Min time Max time
          Copy: 8490.9773 0.0378 0.0377 0.0380
          Scale: 8485.1263 0.0379 0.0377 0.0383
          Add: 9569.1637 0.0503 0.0502 0.0508
          Triad: 9573.1679 0.0505 0.0501 0.0528

          For fun I'll run just 3 copies on the X4-810 to simulate using an X3:
          Function Rate (MB/s) Avg time Min time Max time
          Copy: 8326.3684 0.0386 0.0384 0.0389
          Scale: 8329.6239 0.0386 0.0384 0.0389
          Add: 9358.3690 0.0514 0.0513 0.0517
          Triad: 9346.5083 0.0514 0.0514 0.0518

          So as you can see using current compilers (default under windows or optionally under linux) can yield large differences in performance, in this case over 2x for the i7-920.
          Should be noted that Pathscale maybe no longer available in the near future. It all depends what happens now that Cray has bought it.

          Comment


          • #35
            The i7 750 can be faster than the i7 920 because the i5 can use 5 levels of turbo boost and the i7 only 3. Each step is 133 mhz, some are only available with 1 or 2 cores. A benchmark like povray which only runs on 1 core should show it.

            Comment


            • #36
              Originally posted by Kano View Post
              The i7 750 can be faster than the i7 920 because the i5 can use 5 levels of turbo boost and the i7 only 3. Each step is 133 mhz, some are only available with 1 or 2 cores. A benchmark like povray which only runs on 1 core should show it.
              except that in this test turbo boost was disabled.

              Comment


              • #37
                Originally posted by BillBroadley View Post

                With my results using open64 on a i7-920 without overclocking(running ubuntu):

                $ head -8 /proc/cpuinfo
                processor : 0
                vendor_id : GenuineIntel
                cpu family : 6
                model : 26
                model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
                stepping : 4
                cpu MHz : 2672.704
                cache size : 8192 KB

                $ ./stream-open64
                -------------------------------------------------------------
                STREAM version $Revision: 5.9 $
                -------------------------------------------------------------
                This system uses 8 bytes per DOUBLE PRECISION word.
                -------------------------------------------------------------
                Array size = 20000000, Offset = 0
                Total memory required = 457.8 MB.
                ...
                Function Rate (MB/s) Avg time Min time Max time
                Copy: 22410.3334 0.0143 0.0143 0.0143
                Scale: 22282.7187 0.0144 0.0144 0.0144
                Add: 22511.9469 0.0230 0.0213 0.0234
                Triad: 20943.1595 0.0233 0.0229 0.0234
                Those are triple channel results right? Max possible with dual channel ddr3 1333 would be 21200MB/s. I'd be interested in dual channel results, I expect something around 17GB/s. On an amd setup you must run the nb at around 2.6GHz and the cpu abit faster 2.8GHz to hit the max which was around 15.7GB/s here with DDR3 1333 CL7 (stream build with icc 11.1 and openmp support).

                Update: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.

                PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7

                ICC 11.1
                Code:
                Function      Rate (MB/s)   Avg time     Min time     Max time
                Copy:       13223.4215       0.0026       0.0024       0.0150
                Scale:      13261.3109       0.0025       0.0024       0.0090
                Add:        13726.5011       0.0036       0.0035       0.0048
                Triad:      13788.5482       0.0036       0.0035       0.0050
                Open64 4.2.1
                Code:
                Copy:        8859.2560       0.0036       0.0036       0.0037
                Scale:       8712.0426       0.0037       0.0037       0.0037
                Add:         9541.0925       0.0050       0.0050       0.0051
                Triad:       9749.9439       0.0056       0.0049       0.0111
                GCC-4.3.1
                Code:
                Copy:        8820.2489       0.0041       0.0036       0.0055
                Scale:       8544.5460       0.0041       0.0037       0.0054
                Add:         9597.9497       0.0053       0.0050       0.0055
                Triad:       9632.8513       0.0059       0.0050       0.0100
                PII 955BE 3.2GHz NB 2.6GHz MEM 2xDDR1333 Unganged CL7


                ICC 11.1
                Code:
                Function      Rate (MB/s)   Avg time     Min time     Max time
                Copy:       15116.3113       0.0023       0.0021       0.0089
                Scale:      15794.0372       0.0021       0.0020       0.0029
                Add:        15185.2913       0.0032       0.0032       0.0035
                Triad:      15320.4925       0.0032       0.0031       0.0050
                Open64 4.2.1
                Code:
                Function      Rate (MB/s)   Avg time     Min time     Max time
                Copy:        9624.1021       0.0054       0.0033       0.0216
                Scale:       9329.0977       0.0035       0.0034       0.0035
                Add:        10278.5823       0.0047       0.0047       0.0047
                Triad:      10478.1197       0.0046       0.0046       0.0047
                GCC-4.3.1
                Code:
                Function      Rate (MB/s)   Avg time     Min time     Max time
                Copy:        9445.3011       0.0034       0.0034       0.0034
                Scale:       9297.4320       0.0035       0.0034       0.0035
                Add:        10225.8529       0.0047       0.0047       0.0048
                Triad:      10405.5505       0.0046       0.0046       0.0047
                I used openCC -fopenmp -O2 -o stream-o64 stream.c to build the open64 version. Seems I'm doing something wrong here.
                Last edited by justapost; 09-09-2009, 06:06 AM.

                Comment


                • #38
                  Originally posted by justapost View Post
                  Those are triple channel results right?
                  Yes, triple channel.

                  Originally posted by justapost View Post
                  Max possible with dual channel ddr3 1333 would be 21200MB/s. I'd be interested in dual channel results, I expect something around 17GB/s.
                  Alas my machine is in production, so I can't easily try dual channel. I've seen dual vs tri channel stream numbers posted in a hardware review, alas I can't remember where.

                  Originally posted by justapost View Post
                  Update: Looking at your results it seemed to me open64 compilers generate as efficient code as icc 11.1 so I grabed the actual build and ran a comparison here.

                  PII 955BE 3.2GHz NB 2GHz MEM 2xDDR1333 Unganged CL7

                  ICC 11.1
                  Code:
                  Function      Rate (MB/s)   Avg time     Min time     Max time                                                         
                  Copy:       13223.4215       0.0026       0.0024       0.0150                                                          
                  Scale:      13261.3109       0.0025       0.0024       0.0090                                                          
                  Add:        13726.5011       0.0036       0.0035       0.0048                                                          
                  Triad:      13788.5482       0.0036       0.0035       0.0050
                  Open64 4.2.1
                  Code:
                  Copy:        8859.2560       0.0036       0.0036       0.0037           
                  Scale:       8712.0426       0.0037       0.0037       0.0037           
                  Add:         9541.0925       0.0050       0.0050       0.0051           
                  Triad:       9749.9439       0.0056       0.0049       0.0111
                  I used openCC -fopenmp -O2 -o stream-o64 stream.c to build the open64 version. Seems I'm doing something wrong here.
                  Hrm, try this:
                  Code:
                  gcc -O4 -fopenmp stream.c -o s-gcc-4.3.3 -static 
                  export PATH=/opt/pkg/gcc-4.4.1/bin:$PATH
                  gcc -O4 -fopenmp stream.c -o s-gcc-4.4.1 -static 
                  export PATH=/opt/pkg/x86_open64-4.2.2.1/bin:$PATH
                  opencc -O4 -fopenmp stream.c -o s-open64-4.2.2.1 -static
                  Hopefully it will produce numbers like:
                  Code:
                  $ ./s-gcc-4.3.3  | grep Copy:
                  Copy:        8500.2266       0.0377       0.0376       0.0377
                  $ ./s-gcc-4.4.1  | grep Copy:
                  Copy:        8492.3205       0.0377       0.0377       0.0378
                  $ ./s-open64-4.2.2.1  | grep Copy:
                  Copy:       12487.2286       0.0258       0.0256       0.0258

                  Comment


                  • #39
                    What is important to keep in mind though is that Intel Turbo Boost Technology was disabled on the processors during testing, since this functionality had not worked under Linux for increasing the clock frequency but instead appeared to cause some sporadic performance problems.
                    Update: after starting to see a flow of Windows-based reviews today, it looks like there are some more serious Linux + Lynnfield problems at hand, which we are currently investigating.
                    I thought the TurboBoost feature was OS-independent. It's a mainboard/BIOS setting isn't it? Why would linux not be able to handle this, while windows can?
                    Is this only a problem for the new Lynnfield (i5 750, i7 860 & 870) series, or do the core i7-9xx cpu's also share the same issues? I'm asking since the Turboboost feature underwent some changes between these cpus.

                    And what are the other linux+lynnfield problems you mention? Is it just the lm_sensors package that can't read the temps, or are there other problems at hand here?

                    Like many others, I planned on buying a Lynnfield core i7-860 soon, but if even an AMD triple core performs better in many tests for more than half the money, I'd want answers to above questions before making that decision.

                    Thanks for the benchmarks, but now i'm a bit disappointed with these indications for bad linux performance.

                    Comment


                    • #40
                      Originally posted by BillBroadley View Post
                      Yes, triple channel.

                      Alas my machine is in production, so I can't easily try dual channel. I've seen dual vs tri channel stream numbers posted in a hardware review, alas I can't remember where.
                      Thx for clarifying.


                      Originally posted by BillBroadley View Post
                      Hrm, try this:
                      Code:
                      opencc -O4 -fopenmp stream.c -o s-open64-4.2.2.1 -static
                      -O4 was the key, now I get those results wit the nb still at 2.6GHz.
                      Code:
                      Copy:       14486.5330       0.0025       0.0022       0.0031
                      Scale:      14246.6541       0.0026       0.0022       0.0046
                      Add:        14022.8872       0.0036       0.0034       0.0042
                      Triad:      14011.1763       0.0045       0.0034       0.0107

                      Comment


                      • #41
                        Originally posted by Ant P. View Post
                        Atom can't compete with ARM at all - and both Intel and MS know that.
                        Atom can't compete in ARM's traditional low-power, low-cost, low-performance market, but ARM can't compete with Atom when running x86 code, which is still what most end-users want from a computer. ARM makes Atom look like a dinosaur in embedded systems, and the ARM netbooks look promising enough that I may buy one, but as soon as Joe Sixpack tries to install WoW on his ARM netbook and discovers it doesn't work, he'll be replacing it with an x86 of some description. Intel is still the 800lb gorilla in the CPU market and I can't see that changing any time soon.

                        I'd also add that building an ARM system for home use is insanely expensive compared to building an Atom system; I looked at ARM boards for a home server and the cost would have been at least twice as much as the Atom I eventually bought, for less capability and more hassle (e.g. needing a source of ARM Linux).

                        Comment


                        • #42
                          Java benchmarks?

                          Could you also post some Java benchmarks? Using the sun java compiler. This I feel provides a fairly good estimate of performance as the binary is provided by sun and is always the same and I assume is very well optimized by them for multiple architectures.

                          Before I saw these benchmarks, I was totally kicking myself for ordering a P II X4 955 for my lab. This makes me feel much better.

                          Comment


                          • #43
                            Originally posted by djiezes View Post
                            I thought the TurboBoost feature was OS-independent. It's a mainboard/BIOS setting isn't it? Why would linux not be able to handle this, while windows can?
                            The OS interacts with speedstep, you can do things like force a slow clock speed to extend battery life, lock it to full speed for benchmarking, and tune it in related ways based on your preferences. Intel's been pretty good with the documentation for GPU and CPU as of late, I suspect it's just an issue of a small amount of code or an updated table or two to fix.

                            Send a few lynnfields to a kernel maintainer or two with some docs and I'd expect it in a weekend.

                            Originally posted by djiezes View Post
                            Is this only a problem for the new Lynnfield (i5 750, i7 860 & 870) series, or do the core i7-9xx cpu's also share the same issues? I'm asking since the Turboboost feature underwent some changes between these cpus.
                            Dunno, I have a few nehalem around, but afaik turbo boost is off, for my needs I want consistent performance, not performance that depends on temp and/or what the other cores are doing. Hopefully someone else can contribute. I'd expect the nehalem review would mention this, I've not checked. I'd look for the phoronix nehalem review.

                            Originally posted by djiezes View Post
                            And what are the other linux+lynnfield problems you mention? Is it just the lm_sensors package that can't read the temps, or are there other problems at hand here?
                            All I know about linux and lynnfield is in the phoronix pair of articles on the p55 and lynnfield.

                            Originally posted by djiezes View Post
                            Like many others, I planned on buying a Lynnfield core i7-860 soon, but if even an AMD triple core performs better in many tests for more than half the money, I'd want answers to above questions before making that decision.

                            Thanks for the benchmarks, but now i'm a bit disappointed with these indications for bad linux performance.
                            If you are going to get an AMD you get a fair bit lower price in any case, probably save $80-$100 on the motherboard, and at least $50 (or in the case of the 860 $150 or so). Of particular interest to me is the new 95 watt 3.0 GHz phenom II for $170. I've got the same CPU at 2.6 GHz and it runs cool, fast, and is a pleasure to use. The phenom is especially nice if you want to use the built in graphics with the 785g, not a gaming machine, but very nice otherwise. Mine runs impressively cool, around 50 watts idle, and 110 watts under load, 115 or so if I really abuse it. Said config runs around $500 with 4GB DDR3 ram, 1TB disk, and the phenom II 2.6.

                            However if you are really after the performance it seems that once linux is tuned for lynnfield it should be a good bit faster, especially in the case of the i7-860. It's hard to compete with, faster single thread because of slightly higher IPC, faster single thread because of turbo boost, and faster performance for threaded apps when using hyperthreading and 8 threads (unlike hyperthreading on the p4).

                            So all in all you have to ask your self do you want a great desktop system for $600-$800 that will be a pleasure to use for a wide range of uses or do you want to pay another $200 ish for the i7-860.

                            Granted the turbo boost will take some time to fix, but I'd expect it to be very small relative to the useful life of the system.

                            Comment


                            • #44
                              Originally posted by gost80 View Post
                              Could you also post some Java benchmarks? Using the sun java compiler. This I feel provides a fairly good estimate of performance as the binary is provided by sun and is always the same and I assume is very well optimized by them for multiple architectures.

                              Before I saw these benchmarks, I was totally kicking myself for ordering a P II X4 955 for my lab. This makes me feel much better.
                              I have some stream memory benchmark code that I ported to java, not sure what the point is. I've not found java's JVM particularly efficient or highly optimized, in fact just the opposite. Java has been very slow to adopt 64 bits, vector operations, hell it doesn't even implement registers. For this reason there's an industry built around doing java better than sun. Take a look at ARM (gazelle), android (android phones don't actually run java), and various others shipping java for real time or performance sensitive applications. Nor are java apps particularly common. So any resulting numbers don't tell you much about how well applications will run on it.

                              Comment


                              • #45
                                Originally posted by BillBroadley View Post
                                Granted the turbo boost will take some time to fix, but I'd expect it to be very small relative to the useful life of the system.
                                Thanks for all your answers, that's pretty much what I needed to know.

                                Comment

                                Working...
                                X