Announcement

Collapse
No announcement yet.

AMD Bulldozer Dual-Interlagos Benchmarks On Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Goderic View Post
    They claim +90% performance and +50% extra die space.
    All the benefits of an extra core with only +10% extra die space is never going to happen.
    Source?

    I think you're thinking of Bobcat, which they claimed would provide
    90% of today’s mainstream performance in less than half of the silicon area
    For Bulldozer, the claims I've seen are 12% die space, and virtually double performance on the right kind of code. On the other hand, the wrong kind of code probably gives no speed up at all.

    This source is pretty good at explaining how it works, as well as passing on AMD's 12% die space claim: http://www.anandtech.com/show/3863/a...t-chips-2010/4

    Comment


    • #17
      Originally posted by bbordwell View Post
      If you have the spare time you should do a scaling test for c-ray, e.g run it with 2 cores enable, then 4 then 8 then 16 then 32, It would be useful in seeing how this result translates into desktop bulldozer performance.
      I can try, but I'm not understanding what you're hoping to see. Mine are 12-core G34 opterons, so 16 and 32 aren't even very logical numbers from a test POV.

      Comment


      • #18
        Originally posted by smitty3268 View Post
        Source?
        On the other hand, the wrong kind of code probably gives no speed up at all.
        Actually that would only be the case in single threaded code. The frontend actually supports 2 threads though i suppose there might be some penalty because of the resource sharing in some situations. Also the fp unit can act as 2 128 bit or 1 256bit unit. In short BD has 2 cores that share certain resources which decreases die space but can have a performance penalty in certain situation. That slide btw refers to the die space cost of the 2nd integer cluster only

        Comment


        • #19
          i'm almost positive the i5 2500k is not multi-threaded.

          but really doesn't anyone see the problem with that initial test? 25 seconds is utter crap for 32 cores, no matter what cpu you use. an 8 core sempron or core2 (if they existed) at 2ghz would have more raw processing power than the i5 2500k.

          remember, there are plenty of programs and benchmarks out there that are multi-processor compatible but most are limited to 16 cores. if that test had those same limitations, that would mean half of those cores were doing absolutely nothing, and so it doesn't surprise me that 16 1.8ghz cores is about twice as fast as 4 3.2ghz cores.
          think about it in this way:
          16*1.8=28.8
          4*3.2=12.8
          12.8*2=25.6
          If we were to consider these as theoretical frequencies, proportionally, 25.6 is pretty close to 28.8. Considering the frequency and likelihood of functioning cores, to me, it makes a lot of sense why the amd setup performed twice as fast as the intel setup, and it isn't as crappy as it may seem.

          Comment


          • #20
            Originally posted by [Knuckles] View Post
            I can try, but I'm not understanding what you're hoping to see. Mine are 12-core G34 opterons, so 16 and 32 aren't even very logical numbers from a test POV.
            the core counts are arbitrary I just think it would be interesting to see how it scales up, if this test is scaling with 99% efficiency that is not so good for bulldozer, but if there is a large decline as cores go up then that could be good news.

            Thought this test may have already answered my question as it is single threaded (http://openbenchmarking.org/result/1...IV-HIMENUBUL97)

            That is bad news for bulldozer as that shows about 1/2 the single threaded performance compared to sandy bridge. (2600k gets about 345 in that test where as a 3.6ghz bulldozer would get ~180)

            Comment


            • #21
              Originally posted by bbordwell View Post
              the core counts are arbitrary I just think it would be interesting to see how it scales up, if this test is scaling with 99% efficiency that is not so good for bulldozer, but if there is a large decline as cores go up then that could be good news.

              Thought this test may have already answered my question as it is single threaded (http://openbenchmarking.org/result/1...IV-HIMENUBUL97)

              That is bad news for bulldozer as that shows about 1/2 the single threaded performance compared to sandy bridge. (2600k gets about 345 in that test where as a 3.6ghz bulldozer would get ~180)
              I think this is really too early silicon:
              http://openbenchmarking.org/result/1...KNUC-110322323

              Single-threaded new architecture @ 1.8Ghz vs single-threaded old one @ 1.9Ghz and old one wins!? I wouldn't read too much into these results.

              Comment


              • #22
                Originally posted by [Knuckles] View Post
                I think this is really too early silicon:
                http://openbenchmarking.org/result/1...KNUC-110322323

                Single-threaded new architecture @ 1.8Ghz vs single-threaded old one @ 1.9Ghz and old one wins!? I wouldn't read too much into these results.
                I am going to have to agree with you, no way they would release a new arch that is slower than the last.

                Comment


                • #23
                  Originally posted by bbordwell View Post
                  I am going to have to agree with you, no way they would release a new arch that is slower than the last.
                  Also, the new bulldozer cores are supposed to run way faster than the current stars cores.

                  Comment


                  • #24
                    Originally posted by [Knuckles] View Post
                    Heh:

                    http://openbenchmarking.org/result/1...KNUC-110322585
                    http://openbenchmarking.org/result/1...KNUC-110322102

                    I win

                    But yeah, what's impressive is that you'll be able to get 4 of these on the same system, for a very reasonable price!
                    As they said in the article, "my" R910 is the best on C-Ray so far, so I win. I have you beat by nearly 2 seconds.

                    Comment


                    • #25
                      Originally posted by thalin View Post
                      As they said in the article, "my" R910 is the best on C-Ray so far, so I win. I have you beat by nearly 2 seconds.
                      Ah.. So you are the owner of that system .

                      I intend to do a openbenchmarking.org blog posting of that one.. Can you email me matthew @ phoronix.com to discuss?

                      Comment


                      • #26
                        I just found this out and have not seen it come up yet in discussion, C-ray measures floating point performance which is bulldozers weak point as it only has one FP unit per module. Integer performance then should be about double which would put it on par with sandy bridge.

                        Comment


                        • #27
                          Its pretty interesting what AMD has done. Just forget about the number of cores!, they created the bulldozer module which contains 2 integer cores and 1 FP core. A CPU will contain various bulldozer modules.

                          This redesign is aimed to increase performance on generic programs, which uses lot of integer operations (games included). Programs which makes use of a lot of FP operations (math, video encoders...) would probably not get performance boost.

                          Indeed, it should be interesting to see more tests of this AMD CPU redesign.

                          Comment


                          • #28
                            Originally posted by Jimbo View Post
                            Its pretty interesting what AMD has done. Just forget about the number of cores!, they created the bulldozer module which contains 2 integer cores and 1 FP core. A CPU will contain various bulldozer modules.

                            This redesign is aimed to increase performance on generic programs, which uses lot of integer operations (games included). Programs which makes use of a lot of FP operations (math, video encoders...) would probably not get performance boost.

                            Indeed, it should be interesting to see more tests of this AMD CPU redesign.
                            Agreed here. Also, from what I've read (including the usual marketing stuff), the aim of the Fusion line, and likely Bulldozer, is not single threaded performance, or even single program performance. It's multi-program, multi-threaded performance, with lower power usage that they're aiming for.

                            Comment


                            • #29
                              One still can improve the c-ray performance if one uses opencc as a compiler (http://www.openbenchmarking.org/resu...D1SA-CRAYCOM20). The standard makefile deliverd by PTS isnt aware of the CC env-variable, so i patched the install.sh in ~/.phoronix-test-suite/test-profiles/pts/c-ray-1.0.0/ .

                              Code:
                              #!/bin/sh
                              
                              tar -zxvf c-ray-1.1.tar.gz
                              
                              patch -p0 << 'EOF'
                              --- ./c-ray-1.1/Makefile.orig	2008-04-09 23:57:57.000000000 +0200
                              +++ ./c-ray-1.1/Makefile	2011-03-23 00:49:20.413694037 +0100
                              @@ -1,8 +1,8 @@
                               obj = c-ray-mt.o
                               bin = c-ray-mt
                               
                              -CC = gcc
                              -CFLAGS = -O3 -ffast-math
                              +CC ?= gcc
                              +CFLAGS ?= -O3 -ffast-math
                               
                               $(bin): $(obj)
                               	$(CC) -o $@ $(obj) -lm -lpthread
                              EOF
                              
                              cd c-ray-1.1/
                              make -j $NUM_CPU_JOBS
                              echo $? > ~/install-exit-status
                              cd ..
                              
                              echo "#!/bin/sh
                              cd c-ray-1.1/
                              RT_THREADS=\$((\$NUM_CPU_CORES * 16))
                              ./c-ray-mt -t \$RT_THREADS -s 1600x1200 -r 8 -i sphfract -o output.ppm > \$LOG_FILE 2>&1
                              echo \$? > ~/test-exit-status" > c-ray
                              chmod +x c-ray

                              Comment


                              • #30
                                So does anybody have an idea where bulldozer will be in relation with sandybridge? Performance per core(half module) per watt per clock?

                                Comment

                                Working...
                                X