Announcement

**MartjeB** · 22 January 2010, 01:14 PM

You should keep your code "in the card itself" as much as possible, I think. Like Bridgman says: accessing extern memory is always a bad idea, when you don't have to.

Edit: I should refresh the page more often. agd5f beat me

.

**bridgman** · 22 January 2010, 09:37 PM

Trust me, the alternatives are worse. Microstuttering was a big deal a couple of years ago before the drivers were tweaked to hide the time required to flip images between the cards, but all indications are that the contribution of AFR to micro-stuttering went away at least a year ago. The incidents I'm seeing these days seem to fall into one of two categories :

- interference from a third party utility, typically an overclocking or fan speed tweeker

- games running at a sufficiently slow refresh rate that some frames take 2 display refreshes and others take 3

Both of these "micro-stuttering" problems are being seen with single GPU systems, not just AFR.

I'm not saying micro-stuttering is a complete non-issue, just that AFR's contribution to micro-stuttering seems to have been significantly reduced to the point where it seems like #3 or #4 on the list of causes.

That said, "user friendly" probably wasn't the best choice of words. What I was trying to get across was that any approach other than AFR is generally not going to run the apps that users want. Nearly all of the high eye-candy games with spiffy effects tend to use a lot of post-processing on the rendered frames, and generally AFR is the only model which will handle post-processing without introducing artifacts.

Anyways, I'm not saying the original poster *has* to implement AFR, just giving them a heads-up about the different approaches and their pros / cons. Micro-stuttering is definitely something worth mentioning, because it does take some work to make sure that an AFR implementation does not contribute to micro-stuttering.

**bridgman** · 22 January 2010, 10:04 PM

Originally posted by Qaridarium

you are right but i tell you something about user friendly buying.

Buy a 5870 2GB vram over-clock it to 1,x ghz.....

and then wait wait wait wait long time to R900 in deep 2011!

the Dual-GPU bullshit is not user-friendly its anti-user-stuff!

multi-gpu= how to milk users in the max.... for less

Hey, I'm not the one who wants to add multi-GPU support to the open source drivers

Originally posted by Qaridarium

the alternativ on dev side is simple: drop OpenGL and DirectX completly and force openCL to the new standart and go Raytracing only rendering!

Like this ?

OpenCL path tracing renderer testing, animation (and Bullet Physics)

http://www.youtube.com/watch?v=33rU1axSKhQ

Testing the incredibly tiny OpenCL software demo SmallptGPU (and a quick look at the OpenCL accelerated SmallLuxGPU and Bullet Physics as well). The OpenCL ...

**jrch2k8** · 25 January 2010, 11:02 AM

well my bet here due that the issues is very slow bandwith in the crossfire connector is that OpenCL will suffer from the same thing that opengl in multiGPU systems when you need to work with big textures cuz at code only level crossfire bandwith must be enough. based on this i seriously doubt you can do extremely detailed things in OpenCL with HD textures without any other tech brougth from hell like AFR in OpenGL/directx

now the genius that added GDDR5 to all highend cards never thougth about crossfire link bandwith lol?

**jrch2k8** · 25 January 2010, 11:08 AM

now for me to implement AFR well i dont see it easy at all, expecially get rid of the stuttering :* but ill try maybe i got lucky

**bridgman** · 25 January 2010, 12:39 PM

Originally posted by jrch2k8 View Post

well my bet here due that the issues is very slow bandwith in the crossfire connector is that OpenCL will suffer from the same thing that opengl in multiGPU systems when you need to work with big textures cuz at code only level crossfire bandwith must be enough. based on this i seriously doubt you can do extremely detailed things in OpenCL with HD textures without any other tech brougth from hell like AFR in OpenGL/directx

The challenge with graphics is that OpenGL implements a "single processor" API, so the multi-GPU driver support needs to work without assistance (or even hints) from the application.

The OpenCL API takes a different approach -- it makes multiple processors directly visible to the app, so the application can decide how to split work between them. This generally means partitioning the data between processors and having each processor do "all the work" for the data it receives, which avoids the need to push data between processors, rather than running all of the data through all of the processors. Does that make any sense ?

Originally posted by jrch2k8 View Post

now the genius that added GDDR5 to all highend cards never thougth about crossfire link bandwith lol?

The primary limitation of the inter-card links is width, not speed. The 48xx and 58xx GPUs use a 256-bit memory interface, which is *much* wider than the inter-GPU links. The link bandwidth is more a function of package pincount and board layout issues than the speed of the individual pins.

**jrch2k8** · 25 January 2010, 03:30 PM

well i reach that far watching the crossfire connector but my point is well there are many solutions to this electronically

beside wouldnt be smarter to keep the crossfire connector only like control device (like for example behave like a framebuffer memory mapper) and added the actual link between the cards through the PCIE 2.0 or 3.0 specification, i read several articles that claims that even pcie 1.0 still have enough bandwith to go nasty with anything and that pcie 2.0 is a marketing scam XD (not so sure about that cuz well you know forums, and well i know some electronics but well mobo are too complex nowdays to say anything for sure).

in the case of OpenCL is the same issue, i just have more control about when switch GPUs than with opengl but i still can't freely process texture or complex objects in the others GPUs memory due the bandwith limitation (not sure if tesla cards remove this limitations, cuz from what i,ve seen those babies handle and awesome lvl of load but true they are really more expensive)

**bridgman** · 25 January 2010, 03:43 PM

Typical GDDR5 data rates on shipping products are mid-way between PCIE 1 (2.5 Gbps per pin) and PCIE 2 (5.0 Gbps per pin), but in all cases you need an extremely wide bus to get the kind of bandwidth a modern GPU requires. An x16 or x32 bus isn't going to do the job.

The point I'm trying to make about OpenCL vs OpenGL is that if you structure the OpenCL app (or any compute app) properly you don't have to access data in the other GPU's memory very much in the first place.

**jrch2k8** · 25 January 2010, 03:45 PM

Originally posted by bridgman View Post

The OpenCL API takes a different approach -- it makes multiple processors directly visible to the app, so the application can decide how to split work between them. This generally means partitioning the data between processors and having each processor do "all the work" for the data it receives, which avoids the need to push data between processors, rather than running all of the data through all of the processors. Does that make any sense ?

yes it make sense but now is not all the efficient it should be cuz well even if that fix the hardware ooops with the link between GPUs the developer have to be extremely careful controling the code to avoid intergpu processing with really massive loops operations or it will suffer an slowdown or crash (and that explains why cuda apps mostly arent multigpu aware or are only for really expensive software). i think a more CPU aproach should be more efficient here for both opengl and opencl aka each gpu been electronicaly aware of the other and the ability to map memory for common use

**jrch2k8** · 25 January 2010, 03:48 PM

Originally posted by bridgman View Post

Typical GDDR5 data rates on shipping products are mid-way between PCIE 1 (2.5 Gbps per pin) and PCIE 2 (5.0 Gbps per pin), but in all cases you need an extremely wide bus to get the kind of bandwidth a modern GPU requires. An x16 or x32 bus isn't going to do the job.

The point I'm trying to make about OpenCL vs OpenGL is that if you structure the OpenCL app (or any compute app) properly you don't have to access data in the other GPU's memory very much in the first place.

well i agree too but it will help for really big operations XD and would be easier for developers XD lets wait for pcie 5.0 XD or fiber optics electronic or other cheaper superconductor

Announcement

A question to brigdman or airlie XD about crossfire

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment