Measuring general flops is wrong, the right thing is measuring mac-flops. Example: madd(x=a*b) is 1 general flop because does a *action or a +action, but is 2-mac-flops because does action in 2 objects a,b. fmac(x=a*b+c) is 2-general-flops because does 2 fused actions a* and a+, but is 3-mac-flops because does action in 3 objects a,b,c. So fmac is 1,5-2 times faster than madd. Examples:
gtx280: 240mimd-cores(32bit)*1,4ghz*2-3(madd+mul)= 1-mac-tflop
radeon6900: 384vliw4-cores*4-32bit-executions*900mhz= 1,35-general-tflops(32bit) *2(madd)=2,7mactflops(32bit)
gtx580: 512mimd(dual issue)cores*1-64bit or 2-32bit executions*1,55ghz*2-3(fmac)= 1,6-general-tflops(64bit) or 3,2-general-tflops(32bit) or 4,7-mac-tflops(32bit)
radeon7000: 512simd4-cores*4-32bit-executions*900mhz*2-3(fmac)= 3,8-general-tflops(32bit) or 5,7-mac-tflops(32bit)
gtx600: 2xCores(gtx580)*2xBitrate(128bit quad issue, at the same transistors)= 6,5-general-tflops(64bit) or 13-general-tflops(32bit) or 20-mac-tflops(32bit).
Wile amd gains 2xFlops/watt per generation, nvidia gains 4x after gtx280. Also amd has mediocre opengl and bad d3d to ogl translation, regardless of the generation. Amd is not even close in programmability to fermi and kepler, no full native integer, no full native 64bit, no good vm like cuda for wide many language support. Amd is not cheap, I can buy a gtx-460 new for 100 box in my country, overclocking to 1,8-1,9ghz and I have near gtx580 or 75%-radeon7000 performance, and I use wine-mediacoder_cuda for h264 encoding.
gtx280: 240mimd-cores(32bit)*1,4ghz*2-3(madd+mul)= 1-mac-tflop
radeon6900: 384vliw4-cores*4-32bit-executions*900mhz= 1,35-general-tflops(32bit) *2(madd)=2,7mactflops(32bit)
gtx580: 512mimd(dual issue)cores*1-64bit or 2-32bit executions*1,55ghz*2-3(fmac)= 1,6-general-tflops(64bit) or 3,2-general-tflops(32bit) or 4,7-mac-tflops(32bit)
radeon7000: 512simd4-cores*4-32bit-executions*900mhz*2-3(fmac)= 3,8-general-tflops(32bit) or 5,7-mac-tflops(32bit)
gtx600: 2xCores(gtx580)*2xBitrate(128bit quad issue, at the same transistors)= 6,5-general-tflops(64bit) or 13-general-tflops(32bit) or 20-mac-tflops(32bit).
Wile amd gains 2xFlops/watt per generation, nvidia gains 4x after gtx280. Also amd has mediocre opengl and bad d3d to ogl translation, regardless of the generation. Amd is not even close in programmability to fermi and kepler, no full native integer, no full native 64bit, no good vm like cuda for wide many language support. Amd is not cheap, I can buy a gtx-460 new for 100 box in my country, overclocking to 1,8-1,9ghz and I have near gtx580 or 75%-radeon7000 performance, and I use wine-mediacoder_cuda for h264 encoding.
Comment