The challenge here is that it's not a question of "code both and see which runs best", the question is whether the benefits (from being able to leverage ongoing work by the llvm community) will outweigh the costs (from replacing the current GPU-centric-ish IRs with an arguably CPU-centric IR plus GPU extensions and GPU-aware middleware) over time.
It's a very timely question, but even an initial implementation is only likely to demonstrate that llvm IR can work "OK" with GPUs. The big argument in favor of this proposal is that CPUs and GPUs are becoming more alike over time. I hadn't really thought of GPU architecture in terms of AoS or SoA, so I'll probably have to read the proposal a few times to get those terms mapped onto SIMD and superscalar/VLIW
Keith W summed the situation up pretty well :
So basically I think it's necessary to figure out what would
constitute evidence that LLVM is capable of doing the job, and make
getting to that point a priority.
If it can't be done, we'll find out quickly, if it can then we can
stop debating whether or not it's possible.
I don't think there's much difference in terms of inherent scalability - the discussion is primarily about the shader processing part of the graphics pipe, which I don't *think* is a significant performance bottleneck today anyways.
The best analogy I can come up with is that there are a few different lines of people disappearing off into the distance, and the question is which line is going to move faster over the next few years... bearing in mind that it costs a year or so every time we change lines...
As for performance, there are a lot more low hanging fruit than an optimized compiler at this point (at least for the open source radeon driver). Things like surface tiling, pageflipping, fast clears, and Z related features (HiZ, etc.) will provide much larger performance gains than optimizing the instructions sent to the shader. An optimized shader compiler is like going from -O0 to -O1 or -02 in gcc.
it's more like going from standard fpu to an sse3 optimised path. since the shaders usualy run massively paralel on a lot of pixels, shaving off s few instructions in optimisation does have significant impact on the overall performance.
what I like in the proposal is the universal shader compiler that seems to be the future goal. this means it will look exactly like gcc (with all the benefits of switching architectures with compiler options and not having to recode the whole shader). of course with some HW limitations :-)
I think tiling is coming for r600g (arlied is working on it), Z is not yet implemented in any ways.
Pageflipping might be in the kernel already.
I have no idea about fast clears.
Are there plans to do these in the foreseeable future?
By the way, this new compiler won't happen in the near future, so by the time it is done radeon might very well be at the point where this will be the biggest bottleneck...
I also think that there will be an intermediate solution here. Just like old ati and nvidia chips are not suitable for gallium drivers I think some cards will have this unified compilers while older ones will have to live with what they already have.
Just my not-very-insightful opinion...
After reading the Mesa-dev thread, it seems this is also targeted for general purpose computing on shaders and making it "easy". Getting mindshare off the GPUs to coprocessors. I'd very much like to see GNU Radio FIR filter blocks implemented on the GPU
OK so this is a two step thing. First step is just planting it there in the stack and leaving driver devs to their business. No problem, right?
Later on with newer GPU's, driver devs can target the Glass IR from the get go instead. Still no problem, right?
Initial drm tiling support was added in 2.6.36 and to mesa and the ddx, but that just enabled 1D tiling for render targets. Textures are still not tiled and you get larger performance gains with 2D tiling. Dave is working on tiling support now in r600g. Jerome and I have written a few patches to implement pageflipping support, but nothing is upstream yet. Fast clears and HiZ, etc., are not implemented yet.
But there is a problem in this sweet transition; namely state trackers... =x
Or do state trackers simply cut the layer between device A and B?