Announcement

**illissius** · 01 November 2010, 08:47 AM

Given that graphics is an "embarrassingly parallel" problem, shouldn't it be possible -- theoretically -- to achieve very nearly linear scaling with the number of CPU cores? I'm not saying it would be easy, or that llvmpipe is flawed if it doesn't -- just asking whether, theoretically, it's within the realm of possibility to achieve.

Though I guess one complicating factor here is that it's not just the graphics, but also the normal game logic itself which is running on the CPU at the same time. Have you guys considered trying some kind of purely-graphics benchmark to try and isolate that factor?

**sirdilznik** · 01 November 2010, 09:03 AM

So going by these test results it seems that adding the 6 logical (HT) cores to the physical cores is actually a hindrance to performance at low resolutions and only becomes at all beneficial to performance at high resolutions and only minimally so, at least as far as LLVM Pipe is concerned.

**birdie** · 01 November 2010, 09:06 AM

Is this a joke? A $1K CPU to use as a soft renderer being able to play games only @800x600.

I don't understand the meaning of this article. To show that LLVMpipe scales well? But who's gonna use it anyway?

**nanonyme** · 01 November 2010, 09:12 AM

Originally posted by sirdilznik View Post

So going by these test results it seems that adding the 6 logical (HT) cores to the physical cores is actually a hindrance to performance at low resolutions and only becomes at all beneficial to performance at high resolutions and only minimally so, at least as far as LLVM Pipe is concerned.

"The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper Threading Technology is turned on. "

Hyper-threading - Wikipedia

http://en.wikipedia.org/wiki/Hyper-threading

**Ex-Cyber** · 01 November 2010, 09:33 AM

Originally posted by illissius View Post

Given that graphics is an "embarrassingly parallel" problem, shouldn't it be possible -- theoretically -- to achieve very nearly linear scaling with the number of CPU cores?

Is it known that current mainstream rendering techniques are embarrassingly parallel? I haven't studied the algorithms to any real detail, but it would surprise me if they are (I'd expect some issues with Z-sort and overlapping fragments, at least). Surely some important parts of it are, but that's different from the whole pipeline scaling ideally.

**grigi** · 01 November 2010, 10:12 AM

In the last year, ATI got nearly double the performance going from 160 to 320 execution cores, so yes, 3D rendering is very definitely embarrassingly parallel.

With the current accepted rendering algorithms, Z-sort doesn't need to always happen. Only for transparent rendering you need to sort, and then, only that which is in the tile frustum.

**locovaca** · 01 November 2010, 10:18 AM

Originally posted by Ex-Cyber View Post

Is it known that current mainstream rendering techniques are embarrassingly parallel? I haven't studied the algorithms to any real detail, but it would surprise me if they are (I'd expect some issues with Z-sort and overlapping fragments, at least). Surely some important parts of it are, but that's different from the whole pipeline scaling ideally.

Not to mention that CPUs themselves do not scale linearly either as each core is going to be sharing L2 cache and main memory bandwidth.

**grigi** · 01 November 2010, 10:32 AM

A summary of sort-of typical rendering in 3D (without considering the actual game logic):
Order notation used.

1. Determine view frustum - O(1) - Serial
2. Determine objects in frustum - O(log n) - Somewhat parallel, but not great

3. Roughly sort opaque objects from front to back - O(log n) - Mostly serial
4. Emit every object - O(n) - serial
4. Where surface is split into tiles - almost O(n) parallelization: (reasonable gain here)
4.2 Throw away if unneeded in tile - cheap, early exit point
4.2 Emit each part of object - O(n) - serial
4.2.1 compute render region - O(1) - serial
4.2.2 for each pixel under region - stupidly parallel (most of gain here)
4.2.2.1 test if visible - O(1) - cheap, early exit point
4.2.2.1 render - O(1)

5 & 6. More-or less the same as 3 & 4, but transparent objects sorted back to forward, Sorting here can be more expensive, and early exit points much less used

7. For each post processing: - O(n) - serial
7.1 For each pixel: - stupidly parallel (most of gain here)
7.1.1 Do something

Um, I think that is about it?
Of course, limits such as cache hits, bandwidth, unbalanced workload, etc... all contribute to slow it down.

**bridgman** · 01 November 2010, 11:32 AM

I think the issue here is that while graphics still has a big chunk of embarrassingly parallel work the individual tasks are extremely small so for real scalability you either need some hardware scheduling (like a GPU has) or you need to design the software renderer from day one around the idea of having a very large number of cores/threads (as was attempted with the Larabee renderer).

AFAIK the LLVMpipe renderer was designed for "one to a small number" of threads... I'm pretty impressed with how well it scales.

I'm only looking at the results from 1 core to 6 cores, since the jump from 6 to 12 isn't really bringing more cores onstream just more threads per core.

Announcement

LLVMpipe Scaling With Intel's Core i7 Gulftown

LLVMpipe Scaling With Intel's Core i7 Gulftown

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment