Announcement

Collapse
No announcement yet.

LunarG Proposes A Shader And Kernel Compiler Stack

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    The challenge here is that it's not a question of "code both and see which runs best", the question is whether the benefits (from being able to leverage ongoing work by the llvm community) will outweigh the costs (from replacing the current GPU-centric-ish IRs with an arguably CPU-centric IR plus GPU extensions and GPU-aware middleware) over time.

    It's a very timely question, but even an initial implementation is only likely to demonstrate that llvm IR can work "OK" with GPUs. The big argument in favor of this proposal is that CPUs and GPUs are becoming more alike over time. I hadn't really thought of GPU architecture in terms of AoS or SoA, so I'll probably have to read the proposal a few times to get those terms mapped onto SIMD and superscalar/VLIW

    Keith W summed the situation up pretty well :

    So basically I think it's necessary to figure out what would
    constitute evidence that LLVM is capable of doing the job, and make
    getting to that point a priority.

    If it can't be done, we'll find out quickly, if it can then we can
    stop debating whether or not it's possible.
    Test signature

    Comment


    • #12
      Originally posted by bridgman View Post
      The challenge here is that it's not a question of "code both and see which runs best", the question is whether the benefits (from being able to leverage ongoing work by the llvm community) will outweigh the costs (from replacing the current GPU-centric-ish IRs with an arguably CPU-centric IR plus GPU extensions and GPU-aware middleware) over time.

      It's a very timely question, but even an initial implementation is only likely to demonstrate that llvm IR can work "OK" with GPUs. The big argument in favor of this proposal is that CPUs and GPUs are becoming more alike over time. I hadn't really thought of GPU architecture in terms of AoS or SoA, so I'll probably have to read the proposal a few times to get those terms mapped onto SIMD and superscalar/VLIW

      Keith W summed the situation up pretty well :
      Another question is:
      Is this stack model more scalable than current mesa design?

      We FOSS end users also want ATI and nouveau 3d accel at commercial levels (Leaving aside the time needed to make the driver, of course).

      Comment


      • #13
        I don't think there's much difference in terms of inherent scalability - the discussion is primarily about the shader processing part of the graphics pipe, which I don't *think* is a significant performance bottleneck today anyways.

        The best analogy I can come up with is that there are a few different lines of people disappearing off into the distance, and the question is which line is going to move faster over the next few years... bearing in mind that it costs a year or so every time we change lines...
        Test signature

        Comment


        • #14
          As for performance, there are a lot more low hanging fruit than an optimized compiler at this point (at least for the open source radeon driver). Things like surface tiling, pageflipping, fast clears, and Z related features (HiZ, etc.) will provide much larger performance gains than optimizing the instructions sent to the shader. An optimized shader compiler is like going from -O0 to -O1 or -02 in gcc.

          Comment


          • #15
            Originally posted by agd5f View Post
            As for performance, there are a lot more low hanging fruit than an optimized compiler at this point (at least for the open source radeon driver). Things like surface tiling, pageflipping, fast clears, and Z related features (HiZ, etc.) will provide much larger performance gains than optimizing the instructions sent to the shader. An optimized shader compiler is like going from -O0 to -O1 or -02 in gcc.
            the analogy is almost correct, but the scope is not. going from -O0 to -O2 is not realy that visible, because the usual process does a lot of things.

            it's more like going from standard fpu to an sse3 optimised path. since the shaders usualy run massively paralel on a lot of pixels, shaving off s few instructions in optimisation does have significant impact on the overall performance.

            what I like in the proposal is the universal shader compiler that seems to be the future goal. this means it will look exactly like gcc (with all the benefits of switching architectures with compiler options and not having to recode the whole shader). of course with some HW limitations :-)

            Comment


            • #16
              Originally posted by agd5f View Post
              As for performance, there are a lot more low hanging fruit than an optimized compiler at this point (at least for the open source radeon driver). Things like surface tiling, pageflipping, fast clears, and Z related features (HiZ, etc.) will provide much larger performance gains than optimizing the instructions sent to the shader.
              So how many of those have been implemented so far?
              I think tiling is coming for r600g (arlied is working on it), Z is not yet implemented in any ways.
              Pageflipping might be in the kernel already.
              I have no idea about fast clears.

              Are there plans to do these in the foreseeable future?

              By the way, this new compiler won't happen in the near future, so by the time it is done radeon might very well be at the point where this will be the biggest bottleneck...

              I also think that there will be an intermediate solution here. Just like old ati and nvidia chips are not suitable for gallium drivers I think some cards will have this unified compilers while older ones will have to live with what they already have.

              Just my not-very-insightful opinion...

              Comment


              • #17
                GPGPU

                After reading the Mesa-dev thread, it seems this is also targeted for general purpose computing on shaders and making it "easy". Getting mindshare off the GPUs to coprocessors. I'd very much like to see GNU Radio FIR filter blocks implemented on the GPU

                Comment


                • #18
                  OK so this is a two step thing. First step is just planting it there in the stack and leaving driver devs to their business. No problem, right?

                  Later on with newer GPU's, driver devs can target the Glass IR from the get go instead. Still no problem, right?

                  Comment


                  • #19
                    Originally posted by HokTar View Post
                    So how many of those have been implemented so far?
                    I think tiling is coming for r600g (arlied is working on it), Z is not yet implemented in any ways.
                    Pageflipping might be in the kernel already.
                    I have no idea about fast clears.
                    r300g already supports these features on r3xx-r5xx hardware, so support just needs to be added to r600g. Most of these new features will happen in the gallium driver as it is much easier to add these features there.

                    Initial drm tiling support was added in 2.6.36 and to mesa and the ddx, but that just enabled 1D tiling for render targets. Textures are still not tiled and you get larger performance gains with 2D tiling. Dave is working on tiling support now in r600g. Jerome and I have written a few patches to implement pageflipping support, but nothing is upstream yet. Fast clears and HiZ, etc., are not implemented yet.

                    Comment


                    • #20
                      But there is a problem in this sweet transition; namely state trackers... =x

                      Or do state trackers simply cut the layer between device A and B?

                      Comment

                      Working...
                      X