GPU are optimized to do exactly the same stuff on a huge amount of data.
Take the texels, apply the shaders, output the pixels. The same small amount of code multiplied over millions of pixels.
Thus they are desiged to more or less apply the exact same instruction simultaneously on bit sets of data.
Under the hood they are very huge SIMD processors which keep dozens copies of the exact same thread running at the same time.
Things like conditions and similar are implemented with masks and serializing.
A GPU isn't that efficient when there's too much divergences in the code-path taken at execution time.
It works very nicely for graphics, because you apply the same shader over-and-over again until you've output all the needed pixels.
It works not so nice for raytracing because neighbouring rays might end up taking different path, and thus you reach different states and need to execute different code path. You're execution diverge too much. If one single elements take much more time to process, all the thread copies sharing the same block stall until this "slow-one" has finished.
Things like Larrabee, Tilera (or on smaller scale: Sun's Niaggara, or precursors like Cell) are designed to tolerate much more divergence. They are huge collection of tiny light-weight processors.
But they are much more independent, and basically each run its own thread in it own corner. They *do* share cache and a lot of other resource of common, so it's not quite the same as having a server farm, but they are not forced to run the exact same instuction all at the same time.
Thus they are much more efficient at heavily diverging code-path.
They are good with ray tracing: if one ray takes more processing than its neighbours, the light-weight processor handling it will keep working on it while the other will take another job.
But they aren't that good for pixel churining: lots of resource are wasted for thing which will get redundant when you just basically run the same instruction at the same time over 64 pixels.
So how will LLVMpipe look on a Tilera ?
Well, much better than on a regular CPU (it has much more CPU cores to process pixels in parallel), but not so good than a GPU of similar transistor count/clock frenquency/power usage : the Tilera will just way too much ressource on having each core have it's very own instruction pipe-line, and so on. These resource are good to increase independence and tolerate more divergence (for tasks like raytracing), but are a complete waste for doing OpenGL (kill the extra pipelines and use the freed space to add more computing power to spit more pixels at the same time).
On the other hand, if all you have is a server/workstation with tileras, that could be a nice substitute to have an OpenGL desktop.
What we should see on the long-term is what the GPU maker will plan: the extra pipe-lines of such architecture could be a waste that some GPU maker could afford, because the graphics are fast enough anyway, and this extra capabilities will help taping into some HPC market which is currently only served by small players like Tilera.
So perhaps Tilera-like architecture could become slightly more popular with some constructor.
That's the path that Intel seems to be currently taking woth their Larrabee.
Well on the consumer end, having something that can do ray tracing at an acceptable resolution and frame rate would be a massive boon for gaming companies, since it opens up the flood gates on the levels of realism and physics they can do.
Forgive the usual Intel marketing bullshit since this is actually only being displayed via remote desktop on the laptop, the rendering is actually being done on 4 bigass servers...