Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 41

Thread: AMD Bulldozer Dual-Interlagos Benchmarks On Linux

  1. #31
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by mtippett View Post
    Ah.. So you are the owner of that system .

    I intend to do a openbenchmarking.org blog posting of that one.. Can you email me matthew @ phoronix.com to discuss?
    matthew why dont you and/or Michael write a openbenchmarking wrapper and use the generic checkasm --bench as supplied with a x264 git pull that use that wrapper to analyse the mass of real life data for the speed of each C and assembly routine you get there.

    in fact in this case these cores being new, its a very effective way to get working AVX speed result's for a given core today as x264/checkasm include that in the latest git now, can you do that ASAP and make it a default run option and make a section on openbenchmarking for the raw checkasm --bench output at least

    real life data from a current x264/checkasm git pull is far more interesting than any other app/benchmark as its the only one today that's got masses of fully tested assembly and is maintained and patched when bug's appear or new core's come on line.

    and its a simple report and give temporary remote shell access to the core x264 assembly dev's if you cant git format patch/fix it yourself, they like feedback and especially checkasm --bench result's for new cores too.

    Knuckles/thalin perhaps you two can also do a basic git pull of x264 compile and do a checkasm --bench and put them here/on a permanent http://pastebin.com link

  2. #32
    Join Date
    Apr 2010
    Posts
    26

    Default

    Quote Originally Posted by smitty3268 View Post
    Source?

    I think you're thinking of Bobcat, which they claimed would provide

    For Bulldozer, the claims I've seen are 12% die space, and virtually double performance on the right kind of code. On the other hand, the wrong kind of code probably gives no speed up at all.

    This source is pretty good at explaining how it works, as well as passing on AMD's 12% die space claim: http://www.anandtech.com/show/3863/a...t-chips-2010/4
    You seem to be right, the article I remembered was this one: http://www.anandtech.com/show/2881
    But it's about a year older than your source so probably incorrect.

  3. #33
    Join Date
    Feb 2010
    Posts
    20

    Default

    "since the CPUs are compatible with existing motherboards of the same socket types and there is no graphics support to worry about like there is with Fusion or Sandy Bridge".

    This is incorrect.The bulldozer chips need an AM3+ socket and that is not equal to an 'existing' motherboard.

    http://www.overclock.net/amd-cpus/79...g-live-74.html


    Enjoy.

  4. #34
    Join Date
    Jun 2006
    Location
    Portugal
    Posts
    543

    Default

    Quote Originally Posted by indus View Post
    "since the CPUs are compatible with existing motherboards of the same socket types and there is no graphics support to worry about like there is with Fusion or Sandy Bridge".

    This is incorrect.The bulldozer chips need an AM3+ socket and that is not equal to an 'existing' motherboard.

    http://www.overclock.net/amd-cpus/79...g-live-74.html


    Enjoy.
    I think that phrase was a reference to an earlier one talking about the server versions:
    It has already been reported that AMD's Bulldozer CPUs should be Socket G34 compatible with existing AMD Opteron 6000-series CPUs.

  5. #35

    Lightbulb new cpu needed for Aikoscence

    this results are awsome (sorry for speaking like a kid), they really are ! sure!

    actually i own a core i-975 (4-core 8-thread @3.33Ghz) for my 3d render (see www.aikoscence.com) are beginning boring (very long) and i need more power (time saving) to render in 480p (only 854x480).

    so, if the simple SB core i5 can stand with a amd bulldozer, i'd rather to wait for the next extreme edition (6-core and 8-core) of the sandy bridge cpu (end of this year ??)

  6. #36
    Join Date
    May 2008
    Posts
    6

    Default

    Quote Originally Posted by bbordwell View Post
    I just found this out and have not seen it come up yet in discussion, C-ray measures floating point performance which is bulldozers weak point as it only has one FP unit per module. Integer performance then should be about double which would put it on par with sandy bridge.
    No, Bulldozer has a flexable FPU. If C-ray is using 256 bit FP calculations then yes it appears as one FPU per two cores. But if it is using 128 bit FP's then each core has it's own FPU. Better yet if C-ray uses 64 bit FP's then each core has two FPU's.

    For me personally, I am a big Folding@Home user. It can scale to 128 threads and only uses 64 bit floating point ops. Bulldozer will be amazing.

  7. #37
    Join Date
    Oct 2008
    Posts
    3,247

    Default

    Quote Originally Posted by zerix01 View Post
    No, Bulldozer has a flexable FPU. If C-ray is using 256 bit FP calculations then yes it appears as one FPU per two cores. But if it is using 128 bit FP's then each core has it's own FPU. Better yet if C-ray uses 64 bit FP's then each core has two FPU's.

    For me personally, I am a big Folding@Home user. It can scale to 128 threads and only uses 64 bit floating point ops. Bulldozer will be amazing.
    Everything I've seen says this is wrong. Do you have a source?

    Bulldozer is supposed to have 1 FP core per module, versus 2 integer cores. They say this allows them to make the FP core beefier than it would have been otherwise, because it's one of the more complicated (and therefore expensive) portions of the CPU.

    It is supposed to emulate the 256 bit AVX instructions by transparently using 2 128 bit SSE registers at once, perhaps that's what confused you?

  8. #38
    Join Date
    May 2008
    Posts
    6

    Default

    http://www.pcper.com/article.php?aid=1083

    I meant how they are used logically not physically, in direct response to the other user saying there is only one FPU at all no matter what the type of instruction being used.

    You are saying two different things here.

    Quote Originally Posted by smitty3268 View Post
    Everything I've seen says this is wrong. Do you have a source?

    Bulldozer is supposed to have 1 FP core per module, versus 2 integer cores. They say this allows them to make the FP core beefier than it would have been otherwise, because it's one of the more complicated (and therefore expensive) portions of the CPU.
    It has two 128 bit FPU's per module and both can be independently used. The cost / die savings is by not including two 256 bit FPU's.

    Quote Originally Posted by smitty3268 View Post
    It is supposed to emulate the 256 bit AVX instructions by transparently using 2 128 bit SSE registers at once, perhaps that's what confused you?
    Exactly, but just above you said there is only one.

    Now I could be wrong by saying thee are two separate 128 bit FPU's. But that would mean the BD architecture pictures we have seen are a logical view not a physical one. Otherwise I keep seeing two separate FPU's per module.

  9. #39
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by bbordwell View Post
    I just found this out and have not seen it come up yet in discussion, C-ray measures floating point performance which is bulldozers weak point as it only has one FP unit per module. Integer performance then should be about double which would put it on par with sandy bridge.
    dont think so, the AMD has not beaten Intel at Integer performance clock for clock for a long time if ever, and that's not likely to change when your multi core x264 encoding etc.

    Quote Originally Posted by Jimbo View Post
    Its pretty interesting what AMD has done. Just forget about the number of cores!, they created the bulldozer module which contains 2 integer cores and 1 FP core. A CPU will contain various bulldozer modules.

    This redesign is aimed to increase performance on generic programs, which uses lot of integer operations (games included). Programs which makes use of a lot of FP operations (math, video encoders...) would probably not get performance boost.

    Indeed, it should be interesting to see more tests of this AMD CPU redesign.
    nope , video encoders or rather the codecs both video and audio dont use Floating Point, they are all integer operations based as can be seen in x264 and ffmpeg etc...

    like i said above , a simple git pull, compile, and checkasm --bench result will show this and no auto vectorisation compiler mess to contend, with as at least the assembly is hand coded and yasm macro based re-processing, an/or you could always add in the x264 timing code (i think Loren <pengvado> Merritt originally wrote)
    http://pastebin.com/YxNAaCGj
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <inttypes.h>
    
    static __attribute__((always_inline)) uint64_t read_time(void)
    {
        if(sizeof(long)==8)
        {
            uint64_t a, d;
            asm volatile( "rdtsc\n\t" : "=a" (a), "=d" (d) );
            return (d << 32) | (a & 0xffffffff);
        } else {
            uint64_t l;
            asm volatile( "rdtsc\n\t" : "=A" (l) );
            return l;
        }
    }
    
    #define NOP_CYCLES 22 // time measured by an empty timer on core i7
     
    #define START_TIMER \
    uint64_t tend;\
    uint64_t tstart= read_time();
    
    #define STOP_TIMER(id) {\
    tend= read_time();\
    {\
        static uint64_t tsum=0;\
        static int tcount=0;\
        static int tskip_count=0;\
        if(tskip_count<2)\
            tskip_count++;\
        else{\
        if(tcount<2 || tend - tstart < 16*tsum/tcount + 100000){\
            tsum+= tend - tstart;\
            tcount++;\
        }else\
            tskip_count++;\
        if(((tcount+tskip_count) & (tcount+tskip_count-1)) == 0)\
            printf("%"PRIu64" decicycles in %s, %d runs, %d skips\n", tsum*10/tcount-NOP_CYCLES*10, id, tcount, tskip_count);\
    }}}
    
    #define STOP_TIMER_SUM(id) {\
    tend= read_time();\
    {\
        static uint64_t tsum=0;\
        static uint64_t tother=0;\
        static uint64_t tend0=0;\
        static int tcount=0;\
        tsum += tend - tstart;\
        if(tcount)\
            tother += tstart - tend0;\
        tend0 = tend;\
        tcount++;\
        if((tcount & (tcount-1)) == 0 && tcount > 4)\
            printf("%"PRIu64"/%"PRIu64" cycles %s, %d runs\n", tsum-NOP_CYCLES*tcount, tother+tsum-NOP_CYCLES*tcount, id, tcount);\
    }}
    but unless someone runs the test in these bulldozer with their improved ram controllers , SSE4.2 and 256 bit AVX instructions (even though they may not be the 3 operand opcodes ?) to compare to Intel AVX etc than we shall never know for sure as real life data is required

  10. #40
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by zerix01 View Post
    No, Bulldozer has a flexable FPU. If C-ray is using 256 bit FP calculations then yes it appears as one FPU per two cores. But if it is using 128 bit FP's then each core has it's own FPU. Better yet if C-ray uses 64 bit FP's then each core has two FPU's.

    For me personally, I am a big Folding@Home user. It can scale to 128 threads and only uses 64 bit floating point ops. Bulldozer will be amazing.
    well if that's the case then surely a Nvidia and those new PANTHER POINT aka 22 nm processors ivy bridge LGA 1155/LGA 2011
    with


    are probably going to serve you better , i imagine there will be a Nvidia with PCI express 3.0 bus by then too OC.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •