Announcement

**bridgman** · 01 March 2009, 11:39 PM

No, your understanding is fine. The problem is that high end GPUs require extremely fast, wide memory buses to keep up with the processing power, and nobody has a good way to bridge that kind of bus from card to card yet. The PCIE bus used for a GPU is usually 16 bits wide and can transfer between 4 and 8 GB/sec. External PCIE links tend to be single lane, or between 0.25 and 0.5 GB/sec, similar to SATA.

A typical high end GPU has a memory bandwidth between 50 and 100 GB/sec - yes, maybe 100x the fastest card-to-card link available today and over 10x faster than the most exotic interconnects used in high end supercomputers. If the inter-card connections could keep up with the kind of memory bandwidth you need to run a GPU at full speed then running off a single memory pool would be a lot more attractive. Even then you couldn't afford to do many accesses to the shared memory because sharing a memory pool also means sharing bandwidth. Multi-socket and cluster OSes try to keep a high degree of affinity between process memory and the CPU running the process, since even with a high degree of interconnect "there's no place like local memory".

**Saist** · 02 March 2009, 04:02 AM

pardon me for poking my nose in, but 50gb/sec isn't exactly out of the reach of Hypertransport, and unless I read specification 3 wrong, it contains data on how to manage a processor to processor link over 3 meters. I would think an interconnect based on Hypertransport just might be able to satisfy some of these conditions.

**bridgman** · 02 March 2009, 12:09 PM

I do think all this will become possible and it is getting closer. GPU performance and bandwidth requirements are continuing to grow as well, however, so I don't think we can assume that external buses will automatically catch up with GPU requirements.

The HT links being used today are more on the order of 8GB/sec although the number continues to climb. You can double the numbers if you count both directions but I don't think that maps typical GPU access patterns well so IMO the performance would be driven more by single-direction bandwidth.

Even a 32-bit HT3.1 link (spec'ed but never implemented outside a lab AFAIK) only gives 25 GB/sec in one direction and the cost of a connection like that would be *much* higher than anything used today. I haven't seen a spec for what an external 32-bit >3GHz cable would look like but it sure would be a lot more $$ than what we all use today.

**Saist** · 02 March 2009, 04:30 PM

after seeing people drop $600 plus ... each... various Nvidia cards... I've gotten the feeling that some people will do anything if it means .0001 frames more with all details turned off. (yes, I'm actually fussing at some of the people who have asked in the Quake live forums how to turn everything off while NOT running Radeon 7000's or Geforce DDR's).

Anyways, random fussing aside, thanks for the clarification on why that wouldn't work... "yet" I guess.

**bridgman** · 02 March 2009, 04:44 PM

We ask the same question internally for every new generation of GPUs. It's the high tech version of "are we there yet ?"

**Fultao** · 11 March 2009, 03:42 AM

Concerning Multi-GPU Graphics Cards

Video Card: 768 MB DirectX 10 Graphics Card with Shader 3.0 support (Nvidia .... still going strong despite no true public announcement concerning the rumored game. .... procedural system that enables multiple layers of dynamic clouds; thus, ... Umbra is GPU accelerated occlusion culling software developed by Umbra

**bridgman** · 11 March 2009, 09:36 AM

DX10 is shader model 4.0 AFAIK, not 3.0.

Announcement

Concerning Multi-GPU Graphics Cards

Concerning Multi-GPU Graphics Cards

Comment

Comment

Comment

Comment

Comment

Comment

Comment