Occlusion queries allow a program to use the acceleration hardware to determine if an object would be visible if it were drawn. Writes are turned off, and a counter is used to track the number of pixels (fragments) which would "pass the depth test", ie which are not *behind* something which has already been drawn.
Announcement
Collapse
No announcement yet.
A New Radeon Shader Compiler For Mesa
Collapse
X
-
Originally posted by monohouse View Postisn't that wasteful ? having a whole high level language compiler in the form of a driver ?
isn't it better to have an external executable (say, binary that comes with the driver) to compile the shaders first, and let the driver to only run the compiled shaders ?
The compilation is driver-specific because the hardware differs in what instruction set it uses. The best option is to have the program tell the driver what it wants in a high-level form and let the driver figure out how to translate that into hardware-specific instructions.
That's the same as how the OpenGL/DX APIs work, really. Your program says "set the color to green and draw this triangle" and the driver translates those high-level constructs into the specific hardware commands to get the desired result.
Comment
-
Interest in LLVM seems to have moved "up the stack" a bit, ie using LLVM to generate TGSI rather than using LLVM to translate from TGSI to native hardware instructions. It's certainly possible to use LLVM in both places, but LLVM doesn't currently handle explicitly superscalar hardware so for now the back-end translation seems to be handled best by hardware-specific code.
R3xx-R5xx and RS6xx GPUs support two simultaneous operations per instruction (1 vector + 1 scalar) while R6xx-R7xx GPUs support up to five independent operations per instruction. Operations in a single instruction need to share inputs to some extent, so packing operations into instruction words is non-trivial at best.Test signature
Comment
-
Originally posted by bridgman View PostR3xx-R5xx and RS6xx GPUs support two simultaneous operations per instruction (1 vector + 1 scalar) while R6xx-R7xx GPUs support up to five independent operations per instruction.Last edited by nanonyme; 28 July 2009, 07:28 PM.
Comment
-
Normally you use multiple slots per instruction automatically since you're dealing with 3-4 component vertex or pixel vectors, so it's pretty easy to get "decent" utilization -- but you can definitely get some extra performance in shader-intensive operations by packing other operations into the unused slots.Last edited by bridgman; 28 July 2009, 07:55 PM.Test signature
Comment
-
Originally posted by bridgman View PostNormally you use multiple slots per instruction automatically since you're dealing with 3-4 component vertex or pixel vectors, so it's pretty easy to get "decent" utilization -- but you can definitely get some extra performance in shader-intensive operations by packing other operations into the unused slots.
Btw, do you end up in similar pipelining challenges with GPU's as there are with CPU's?
Comment
-
Yep. One of the contributors to the "60-70%" performance estimate for open source vs fglrx drivers was the use of a sub-optimal shader compiler in the open source drivers.
Pipelining is not much of an issue for GPUs, at least not for ours. The actual graphics pipeline is *very* long, potentially thousands of clocks or more, but data only flows one way most of time and read-after-write situations (eg rendering into a pixmap then using the results as a texture) are treated as exceptions with explicit cache flushes.
Inside the shader core itself pipelines are for all practical purposes non-existent. This is possible because GPUs are almost always dealing with hugely parallel workloads, so single-thread performance doesn't really matter. We run at relatively low frequencies compared to a CPU (which allows a much shorter pipeline), and the SIMD engines process 4 clocks worth of work at a time (eg 64 threads at a time for a 16-way SIMD, processing 64 threads in 4 clocks) which allows the remaining bit of pipelining to be hidden from the programming model.Last edited by bridgman; 28 July 2009, 09:59 PM.Test signature
Comment
-
Originally posted by bridgman View PostYep. One of the contributors to the "60-70%" performance estimate for open source vs fglrx drivers was the use of a sub-optimal shader compiler in the open source drivers.
IIRC the official statement regarding opening fglrx was a mixture of "you don't want to know what's inside" and "there's 3rd party code inside we're not allowed to open".
After reading your explanations, it seems reasonable to assume that a shader compiler is *very* hardware-specific, thus probably developed in-house, which could give you the option of opening it. Wouldn't that both save your OS-developers valuable time AND result in better OS-drivers and maybe (through community patches) better CS-drivers?
Or are there some simple problems I'm overlooking, like incompatible compiler APIs/tight coupling with fglrx? Or more complicated problems, like patents/IP, non-obvious licensing problems, too much work to clear the whole thing, fear of nvidia stealing your shinies, ..?
I don't want to sound ungrateful, AMD/ATI has shared a lot and done a lot for us linux-users and it's impudent to ask for more.. But since you keep taunting us that fglrx will always be superior because it has that great shader compiler, I'm just curious about the reasons
Comment
-
Originally posted by rohcQaH View PostIIRC the official statement regarding opening fglrx was a mixture of "you don't want to know what's inside" and "there's 3rd party code inside we're not allowed to open".
Comment
Comment