Announcement

Collapse
No announcement yet.

A New Radeon Shader Compiler For Mesa

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Occlusion queries allow a program to use the acceleration hardware to determine if an object would be visible if it were drawn. Writes are turned off, and a counter is used to track the number of pixels (fragments) which would "pass the depth test", ie which are not *behind* something which has already been drawn.
    Test signature

    Comment


    • #22
      Originally posted by monohouse View Post
      isn't that wasteful ? having a whole high level language compiler in the form of a driver ?

      isn't it better to have an external executable (say, binary that comes with the driver) to compile the shaders first, and let the driver to only run the compiled shaders ?
      It makes absolutely no difference in terms of "waste" but your approach would pretty much guarantee that developers would either stop supporting non-core platforms or that they would always ship shaders in compiled form (assuming there is some standardized compiled form to ship in).

      The compilation is driver-specific because the hardware differs in what instruction set it uses. The best option is to have the program tell the driver what it wants in a high-level form and let the driver figure out how to translate that into hardware-specific instructions.

      That's the same as how the OpenGL/DX APIs work, really. Your program says "set the color to green and draw this triangle" and the driver translates those high-level constructs into the specific hardware commands to get the desired result.

      Comment


      • #23
        I wonder what happened to that LLVM shader compiler that was talked about months ago...

        Comment


        • #24
          Interest in LLVM seems to have moved "up the stack" a bit, ie using LLVM to generate TGSI rather than using LLVM to translate from TGSI to native hardware instructions. It's certainly possible to use LLVM in both places, but LLVM doesn't currently handle explicitly superscalar hardware so for now the back-end translation seems to be handled best by hardware-specific code.

          R3xx-R5xx and RS6xx GPUs support two simultaneous operations per instruction (1 vector + 1 scalar) while R6xx-R7xx GPUs support up to five independent operations per instruction. Operations in a single instruction need to share inputs to some extent, so packing operations into instruction words is non-trivial at best.
          Test signature

          Comment


          • #25
            Originally posted by bridgman View Post
            R3xx-R5xx and RS6xx GPUs support two simultaneous operations per instruction (1 vector + 1 scalar) while R6xx-R7xx GPUs support up to five independent operations per instruction.
            Am I to assume if you fail to fill all five, you end up with suboptimal instructions and slower drivers? (of course, optimal driver probably doesn't and can't exist anyway; more of a matter how far you are from it)
            Last edited by nanonyme; 28 July 2009, 07:28 PM.

            Comment


            • #26
              Normally you use multiple slots per instruction automatically since you're dealing with 3-4 component vertex or pixel vectors, so it's pretty easy to get "decent" utilization -- but you can definitely get some extra performance in shader-intensive operations by packing other operations into the unused slots.
              Last edited by bridgman; 28 July 2009, 07:55 PM.
              Test signature

              Comment


              • #27
                Originally posted by bridgman View Post
                Normally you use multiple slots per instruction automatically since you're dealing with 3-4 component vertex or pixel vectors, so it's pretty easy to get "decent" utilization -- but you can definitely get some extra performance in shader-intensive operations by packing other operations into the unused slots.
                Well, yeah... When I mean optimal, I also meant as in ideal situation.
                Btw, do you end up in similar pipelining challenges with GPU's as there are with CPU's?

                Comment


                • #28
                  Yep. One of the contributors to the "60-70%" performance estimate for open source vs fglrx drivers was the use of a sub-optimal shader compiler in the open source drivers.

                  Pipelining is not much of an issue for GPUs, at least not for ours. The actual graphics pipeline is *very* long, potentially thousands of clocks or more, but data only flows one way most of time and read-after-write situations (eg rendering into a pixmap then using the results as a texture) are treated as exceptions with explicit cache flushes.

                  Inside the shader core itself pipelines are for all practical purposes non-existent. This is possible because GPUs are almost always dealing with hugely parallel workloads, so single-thread performance doesn't really matter. We run at relatively low frequencies compared to a CPU (which allows a much shorter pipeline), and the SIMD engines process 4 clocks worth of work at a time (eg 64 threads at a time for a 16-way SIMD, processing 64 threads in 4 clocks) which allows the remaining bit of pipelining to be hidden from the programming model.
                  Last edited by bridgman; 28 July 2009, 09:59 PM.
                  Test signature

                  Comment


                  • #29
                    Originally posted by bridgman View Post
                    Yep. One of the contributors to the "60-70%" performance estimate for open source vs fglrx drivers was the use of a sub-optimal shader compiler in the open source drivers.
                    maybe you aren't allowed to answer this, but..

                    IIRC the official statement regarding opening fglrx was a mixture of "you don't want to know what's inside" and "there's 3rd party code inside we're not allowed to open".

                    After reading your explanations, it seems reasonable to assume that a shader compiler is *very* hardware-specific, thus probably developed in-house, which could give you the option of opening it. Wouldn't that both save your OS-developers valuable time AND result in better OS-drivers and maybe (through community patches) better CS-drivers?

                    Or are there some simple problems I'm overlooking, like incompatible compiler APIs/tight coupling with fglrx? Or more complicated problems, like patents/IP, non-obvious licensing problems, too much work to clear the whole thing, fear of nvidia stealing your shinies, ..?

                    I don't want to sound ungrateful, AMD/ATI has shared a lot and done a lot for us linux-users and it's impudent to ask for more.. But since you keep taunting us that fglrx will always be superior because it has that great shader compiler, I'm just curious about the reasons

                    Comment


                    • #30
                      Originally posted by rohcQaH View Post
                      IIRC the official statement regarding opening fglrx was a mixture of "you don't want to know what's inside" and "there's 3rd party code inside we're not allowed to open".
                      I was talking of comparing the open drivers to a theoretical optimal driver though. (which would in all likelyhood be better than ATi's proprietary driver in optimizing operations into instructions) Even if ATi opened up their shader compiler code, it still would at best allow open drivers to mimick them and reach similar performance, never allowing them to become better. (since becoming better might require a different design for the same stuff) While the current approach probably is slower, it might yield interesting results given enough talented developers. (and assuming there's still enough space between ATi's implementation and optimal implementation to dance mambo in)

                      Comment

                      Working...
                      X