Announcement

**bridgman** · 16 April 2011, 05:26 PM

Originally posted by runeks View Post

I guess what I'm interested in is these two steps. Perhaps only step 2.
I'm not sure what you mean by "write a separate test app with a shader implementing it" though. Why would we write a separate (test) application to implement a sub-feature of a state tracker? Or do you mean just writing an application that can be used to test whichever decoding routine we choose to optimize using shaders?

Also, in step 3: are we not writing this shader in TGSI ourselves? If so, why would we use mesa to record "the TGSI code it generates"?

I think smitty meant "write a test app with a GLSL shader".

**smitty3268** · 16 April 2011, 05:40 PM

Originally posted by runeks View Post

I guess what I'm interested in is these two steps. Perhaps only step 2.
I'm not sure what you mean by "write a separate test app with a shader implementing it" though. Why would we write a separate (test) application to implement a sub-feature of a state tracker? Or do you mean just writing an application that can be used to test whichever decoding routine we choose to optimize using shaders?

Also, in step 3: are we not writing this shader in TGSI ourselves? If so, why would we use mesa to record "the TGSI code it generates"?

What i mean is a toy app which only has a single shader in it that does the idtc. Used to test that shader until it's working. It's easier doing it there because then you can just use standard OpenGL instead of trying to hook up a shader compiler inside the state tracker.

Think about optimizing a function in x264. First you'd write it in C code and make sure that's working. Then you can compile that to assembly with GCC and copy the output into an assembly section in x264. Then you can work on actually trying to optimize the assembly, instead of writing it from scratch.

I know that wouldn't always lead to optimal results, but it does seem like the quickest path to me.

PS - I am not a developer involved with Mesa, video decoding, or anything else discussed here. So I'm just giving my opinion of what will probably be done, I have no inside information.

**runeks** · 16 April 2011, 05:54 PM

Originally posted by bridgman View Post

I think smitty meant "write a test app with a GLSL shader".

Ah, of course, that makes sense. To me it seems like writing directly in TGSI is simply too low level. I mean, who would want to write these sort of algorithms in an assembly-like language/on an instruction-by-instruction basis. It simply seems like too much work. So it definitely makes sense to first write it in GLSL and look a the (TGSI) output of the GLSL compiler. Or, rather, just copy-and-paste the TGSI code into the state tracker. I know I wouldn't want to fiddle too much with TGSI. I imagine just getting a part of the video coding process written and working in GLSL is quite a job in itself.

Originally posted by smitty3268 View Post

What i mean is a toy app which only has a single shader in it that does the idtc. Used to test that shader until it's working. It's easier doing it there because then you can just use standard OpenGL instead of trying to hook up a shader compiler inside the state tracker.

Hmm, bear with me here. When you write "shader", are you referring to the shader program? I keep thinking about the actual hardware units on the graphics card when I read "shader". So we'd write an application that implements an iDCT function in GLSL, and then input various values into this function and see that we get the correct results?

Originally posted by smitty3268 View Post

Think about optimizing a function in x264. First you'd write it in C code and make sure that's working. Then you can compile that to assembly with GCC and copy the output into an assembly section in x264. Then you can work on actually trying to optimize the assembly, instead of writing it from scratch.

I think I get the point. On a CPU the path would be C->asm->optimized asm while on a GPU the path would be GLSL->TGSI->optimized TGSI.
But again, would we even gain that much trying to optimize the TGSI? Wouldn't the fact that GLSL is able to utilize hundreds of shaders make it optimized enough, or is it just far more complicated than I'm making it out to be? (It often is

)

**smitty3268** · 16 April 2011, 06:06 PM

Originally posted by runeks View Post

I think I get the point. On a CPU the path would be C->asm->optimized asm while on a GPU the path would be GLSL->TGSI->optimized TGSI.
But again, would we even gain that much trying to optimize the TGSI? Wouldn't the fact that GLSL is able to utilize hundreds of shaders make it optimized enough, or is it just far more complicated than I'm making it out to be? (It often is

)

Using GLSL would probably be good enough. I just don't know how easy it would be to hook the GLSL compiler into the VDPAU state tracker - maybe it's already extremely simple and would only take a couple of lines, or maybe it would require tons of glue code and the current compiler only really works with lots of assumptions that it's being called from the OpenGL tracker. Further, I don't know how much of a slowdown compiling those shaders would be and if it makes sense to "pre-compile" them from a performance standpoint or not.

So my guess was that to keep things simple they would only mess with the TGSI in the state tracker, but I don't know if that's really the plan or not.

I do think the developers are quite familiar with TGSI and I don't think they would view working directly with it too burdensome. They are the same people who are writing the driver compilers, after all, which are working directly on the TGSI and previous mesa IR code.

**bridgman** · 16 April 2011, 06:27 PM

Originally posted by runeks View Post

Hmm, bear with me here. When you write "shader", are you referring to the shader program? I keep thinking about the actual hardware units on the graphics card when I read "shader". So we'd write an application that implements an iDCT function in GLSL, and then input various values into this function and see that we get the correct results?

Strictly speaking the term "shader" originally described the program, not the hardware. I believe the term originated with RenderMan but not sure... anyways, dedicated hardware for running shader programs came later.

Originally posted by runeks View Post

I think I get the point. On a CPU the path would be C->asm->optimized asm while on a GPU the path would be GLSL->TGSI->optimized TGSI. But again, would we even gain that much trying to optimize the TGSI? Wouldn't the fact that GLSL is able to utilize hundreds of shaders make it optimized enough, or is it just far more complicated than I'm making it out to be? (It often is

)

In general you are mostly optimizing with respect to memory accesses (memory bandwidth is always a challenge) more than shader hardware. Algorithms like IDCT and filtering tend to have to perform a lot of reads for every write (even more so than with normal textured rendering), and a significant part of optimizing is about reducing the number of reads or making sure the read pattern is cache-friendly.

**bridgman** · 16 April 2011, 06:41 PM

Try this link - I'm on dial-up so it'll be an hour or so before I can confirm if it's the right slide deck, but I *think* this deck talks about optimizing for compute-type applications (and video decode is more like compute work than 3D work) :

http://developer.amd.com%2Fgpu_asset...plications.pdf

**runeks** · 16 April 2011, 06:58 PM

@smitty3268 I think we're on the same page here. What I meant wasn't really to enable state trackers to use GLSL directly, but rather just use the TGSI that the GLSL compiler outputs as-is, ie. to just copy-and-paste that TGSI code into a state tracker without optimizations. So we'd just be using the already functioning GLSL compiler to generate the TGSI code that we'd be sticking in the state tracker.

Although I was actually about to ask how much it would take to make GLSL directly supported in state trackers instead of TGSI. But I guess that leads me to John's response (wrt. optimizing)...:

Originally posted by bridgman View Post

In general you are mostly optimizing with respect to memory accesses (memory bandwidth is always a challenge) more than shader hardware. Algorithms like IDCT and filtering tend to have to perform a lot of reads for every write (even more so than with normal textured rendering), and a significant part of optimizing is about reducing the number of reads or making sure the read pattern is cache-friendly.

I see. And so, GLSL doesn't really cut it because it abstracts away all the memory management right?
But I guess it's just a matter of learning TGSI like any other language. I've just only briefly touched on RISC assembly, and that seemed like so much effort for so little. It would probably help if we had some TGSI code already, generated from GLSL to start with though.

Originally posted by bridgman View Post

[...]
http://developer.amd.com/gpu_assets/...plications.pdf [fixed the URL]

Sweet! Looks great! Second page says "GPGPU from real world applications - Decoding H.264 Video" so without knowing much else I'd say it's right on the money.
I will definitely be digging into that at some point! Would there happen to be a recorded talk/presentation over these slides somewhere?

**bridgman** · 16 April 2011, 07:08 PM

Originally posted by runeks View Post

Although I was actually about to ask how much it would take to make GLSL directly supported in state trackers instead of TGSI. But I guess that leads me to John's response (wrt. optimizing)... <snip> And so, GLSL doesn't really cut it because it abstracts away all the memory management right?

Actually I was mostly responding to your question about the need to optimize.

GLSL could probably get you pretty close, if not give you the same performance (although I haven't done enough shader work to be sure). The real issue is that a Gallium3D state tracker uses Gallium3d calls and TGSI shaders by definition, so you probably want to end up with TGSI rather than copying a big heap of code from the OpenGL state tracker (aka Mesa) to convert the shaders from GLSL to TGSI every time you wanted to decode a video.

Originally posted by runeks View Post

But I guess it's just a matter of learning TGSI like any other language. I've just only briefly touched on RISC assembly, and that seemed like so much effort for so little. It would probably help if we had some TGSI code already, generated from GLSL to start with though.

I imagine there is a debug mechanism in Mesa to do that already, not sure though.

Originally posted by runeks View Post

Sweet! Looks great! Second page says "GPGPU from real world applications - Decoding H.264 Video" so without knowing much else I'd say it's right on the money. I will definitely be digging into that at some point! Would there happen to be a recorded talk/presentation over these slides somewhere?

There might be (or, more likely a newer talk), but I would have to be at work (with something faster than 24 Kb/s download) to find it before the technology becomes obsolete

**runeks** · 16 April 2011, 07:35 PM

Originally posted by bridgman View Post

Actually I was mostly responding to your question about the need to optimize.

GLSL could probably get you pretty close, if not give you the same performance (although I haven't done enough shader work to be sure). The real issue is that a Gallium3D state tracker uses Gallium3d calls and TGSI shaders by definition, so you probably want to end up with TGSI rather than copying a big heap of code from the OpenGL state tracker (aka Mesa) to convert the shaders from GLSL to TGSI every time you wanted to decode a video.

Yes, as things stand today it would have to be implemented in TGSI as you say.

But I guess my point is that if the TGSI code produced from the GLSL code, as you say, doesn't even need any optimizations to perform really well, then maybe the ability to hook in the GLSL compiler into other Gallium state trackers would ease the developing of future state trackers? Or is Gallium just designed in such a way that this cannot be done without creating a mess?
The GLSL compiler lives in the mesa state tracker right? So if we were to utilize this feature, the GLSL compiler, in another state tracker, we would be creating a new state tracker that is dependant on another state tracker (mesa). But I guess that's not something that is too distant of a concept to Linux; dependencies.
Of course, it would probably need to be some kind of Just-in-time compiler with code caching, in order to be effective. Which quickly makes it quite of a project in itself.

Originally posted by bridgman View Post

There might be (or, more likely a newer talk), but I would have to be at work (with something faster than 24 Kb/s download) to find it before the technology becomes obsolete

Hehe

. If you do find the talk, or any other talk(s) regarding this please do post a link in this thread. The talks usually go into more detail, plus the questions often reflect my own questions on the topic.

**bridgman** · 16 April 2011, 08:15 PM

The issue is that OpenGL is a Big Honkin' API and therefore needs a Big Honkin' State Tracker. Mesa is a lot bigger and more complex than the video decoder state tracker would be, and you would probably end up having a lot more code supporting GLSL than supporting video decode.

It's sort of like bringing your house into your car so you can make coffee while you drive -- OK in principle but not so good in practice

Announcement

VP8 Over VDPAU In Gallium3D Is Emeric's Target

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment