Announcement

**bridgman** · 23 October 2010, 06:32 PM

Originally posted by HokTar View Post

Yes, I wanted to mention that you never said that you won't as Michael wrote in the article. You wrote in the other forum that you will look at it and in 6 months you will be able to tell if it will ever happen or not.

Right... although 6 months is a rough guess, not a hard plan.

Originally posted by HokTar View Post

So, for example the vdpau state tracker would eat the vdpau calls from mplayer and turn them into tgsi intsructions* then the r600g driver would eventually display it.

Actually a combination of TGSI instructions for the shader programs and Gallium3D API calls for everything else (including "turn this TGSI into something the GPU can use"). The shader programs will mostly do filtering (for MC and deblocking) and other API calls will invoke those filters on a bunch of squares/rectangles as part of the higher level decoding process.

Originally posted by HokTar View Post

*: Well, I think tgsi will be turned back to glsl ir by the driver first, then it would get optimised and fed back to the driver to actually display something. Right?

GLSL IR happens in the upper level mesa; a Gallium3D state tracker comes in below mesa (ie mesa is one of many state trackers). The normal GL flow (greatly simplified) is :

Classic GL driver
=================

GLSL
compiler generats GLSL IR
utility routine converts to Mesa IR
--------
Mesa IR
driver converts to hardware

(dashed line is the interface between common mesa and HW driver)

Gallium3D GL driver
===================

GLSL
compiler generates GLSL IR
utility routine converts to Mesa IR
utility routine converts to TGSI
----
TGSI
driver converts to hardware

(Mesa IR is converted to TGSI in the common mesa code then passed to the Gallium3D HW driver)

Video state tracker implementing full H.264 decode
==================================================

H.264 slice
CPU does bitstream parse, entropy decode, probably IDCT
tracker generates TGSI for MC, deblock
----
TGSI
driver converts to hardware

Video state tracker implementing MC/deblock only
================================================

partially decoded surfaces
tracker generates TGSI for MC, deblock
----
TGSI
driver converts to hardware

The potentially confusing thing about nearly all of the video interfaces is that their definitions include multiple entry points -- bitstream/slice, XvMC-like, and (IIRC) Xv-like. Drivers typically implement only one of those entry points and the player needs to do the rest (which means that a player that "supports VA-API" might only support one of the entry points while a "VA-API driver" might only support a different entry point.

Most of the driver implementations so far have been for use with hardware decoders and have implemented the top level entry point, but things will get more interesting as other entry points start to get used more. I expect the most common stack for shader-assisted decode will be whichever of the lower level entry points for an existing API (VA-API or VDPAU) best aligns with the MC/deblock functions, and if none of them fit then an enhanced XvMC would probably be used instead.

Originally posted by HokTar View Post

Let's assume we have a working vdpau state tracker with r600g. Now how could we add vp8 decoding possibility to this?

1) The shader based version which was started eons ago needs to be done for all formats, right?
2) An opencl based solution would require a working opencl state tracker and the ffmpeg guys to write decoders for each codec.
3) The vdpau state tracker needs to be written only once and then we've got to wait for nvidia devs to implement support new formats. Maybe some modifications are needed afterwards.

IIRC the VDPAU API also supports multiple entry points, but I don't remember if it has something corresponding to MC/deblock. I *think* VA-API is a better fit but not 100% sure.

What I see happening is a state tracker that exposes a few standard functions (MC, deblock) which can be used by a higher level decode routine (eg ffmpeg). The existing ffmpeg codecs would be modified to call into the generic MC/deblock state tracker and perform those functions on the GPU rather than the CPU, so the additional per-format work should be relatively small.

Originally posted by HokTar View Post

So I am actually confused; why haven't the xorg devs written a proper opencl state tracker with good driver support and said that:
"Write opencl code and don't ever mention again uvd/xvba/vdpau/vaapi!"?
It seems much easier to me and the end result might as well be better than with any other option.

The same reason they haven't cured cancer or done something about the common cold. It's a big honkin' task and since the plan is to build OpenCL over Gallium3D it makes sense to get the Gallium3D drivers fully implemented first.

Originally posted by HokTar View Post

Conclusion:
The fastest solution is what is in the article, the "vdpau state tracker for r600g".
The best would be possibly the opencl one but there is no such decoder at the moment so it wouldn't solve the situation right now.

I'm not convinced that OpenCL is actually the best option. IMO the Gallium3D API (via a state tracker to isolate decoders from changes in the Gallium3D API) is a better fit, since Gallium3D makes it easier to use the GPUs texture filtering resources along with the ALUs.

Originally posted by HokTar View Post

One more thing:
So the decoder is now written in c/assembly which runs on the cpu or gpu (through vdpau/xvba/vaapi).
If it was written in opencl then it could run on the cpu, the gpu or both. In this case vdpau/xvba/vaapi won't get touched but the drivers' opencl capabilities will be used.

Please correct me if I am wrong somewhere!

I guess the short answer is that I think the shader-assisted decoders will be more like graphics work than compute work, and so something running directly over Gallium3D is likely to be both the shortest and most efficient path.

**bridgman** · 23 October 2010, 06:43 PM

BTW in case you are wondering why all that conversion stuff is happening, the Intel devs (Ian et al) are looking at one more step :

Classic GL driver (Intel proposal as I understand it)
=================

GLSL
compiler generats GLSL IR
--------
GLSL IR
driver converts to hardware

In other words, GLSL IR would replace Mesa IR as the standard interface between common Mesa code and hardware-specific drivers. It seems like a reasonable idea, although I don't know if anyone has had time to really look at the impact on HW drivers and on TGSI.

Since GLSL IR => Mesa IR => HW and GLSL IR => Mesa IR => TGSI => HW seem to be working OK so far it seems like eliminating the Mesa IR step should certainly work... in the worst case the utility routine for converting from GLSL IR to Mesa IR would have to become part of each HW driver rather than part of the common Mesa code.

There would be a change required for Gallium3D as well :

GLSL
compiler generats GLSL IR
utility routine converts to TGSI (hopefully not going through Mesa IR on the way

)
--------
TGSI
driver converts to hardware

**HokTar** · 23 October 2010, 06:52 PM

Thanks for the insight! It really shed some light on these matters!

Is it not the plan right now to get rid of mesa ir and tgsi and create an llmv ir instead? I checked the thread on mesa-dev but there is no conclusion so far.

**bridgman** · 23 October 2010, 07:26 PM

I think everyone is trying to finish off the last big architectural upheaval before thinking too hard about the next one

AFAIK there are three things we don't know yet :

1. Whether developers feel that a "flat" and "generic" shader program representation like TGSI is still felt to be useful after working with Gallium3D and TGSI for a year or so... if TGSI is still felt to be the best interface to HW drivers then the LunarG proposal would affect Mesa internals but would not really affect the drivers. At first glance it seems that having the ability to tailor the lower LLVM IR to match the target hardware would imply at least some extensions to TGSI but I don't know for sure.

2. Whether the proposed LLVM optimizations and conversions are felt to be significantly useful for the Intel drivers. I think I remember Ian saying that he felt that a lot of optimization had to be aware of the target hardware, which means that the LLVM-based optimizations could be useful

3. Whether the key premise of the LunarGlass proposal is valid, ie whether "the optimizations available in the LLVM ecosystem" can more or less "just work" with the graphics-oriented LLVM IR extensions or whether a lot of the optimization work will need to be written from scratch anyways, raising the question of whether hw-specific LLVM IR is much better than GLSL IR or than some other HW-specific IR.

There are probably other things we don't know but I don't even know what they are.

**V!NCENT** · 23 October 2010, 11:29 PM

OK so last gen CPU's and below suck at ripped video playback. Fair enough. I jumped from a 32bit Athlon 2800 XP+ to an Phenom 9950 X4 so I didn't know it was only one of the first CPUs to leverage full hd playback.

About that LLVM IR... please not before succesful TGSL. But on the other hand; if no LLVM IR right now then we will run into useless TGSL state trackers on the long run, right? But on the other hand; while FLOSS drivers may work for me and others, thy are still not anywhere near fully featured blob replacements, so you might as well invest further into the future...

**bridgman** · 24 October 2010, 12:26 AM

Staying with TGSI for a bit too long isn't likely to be a problem since it should be pretty easy to convert almost any future IR into TGSI.

The big open question is which approach will allow development to proceed most quickly. I don't think anyone knows the answer to that one yet.

**gbeauche** · 24 October 2010, 07:15 AM

Originally posted by bridgman View Post

IIRC the VDPAU API also supports multiple entry points, but I don't remember if it has something corresponding to MC/deblock. I *think* VA-API is a better fit but not 100% sure.

No, VDPAU is VLD only. VDPAU might evolve in some more detailed like VA-API (slice level bitstream info), for better error recovery and checks, but nothing set in stone yet. And nothing else than VLD anyway. Actually, VDPAU was designed to be VLD only so that user applications don't need to be modified much. Which, in practise, they don't need to, even with MC/IDCT entry-points, assuming they use a common decoding library. e.g. FFmpeg.

**gbeauche** · 24 October 2010, 07:19 AM

Originally posted by Kano View Post

That's maybe 99.9% sure...

You forget that HD 6000 series chips have UVD3. So you have no guarantee that the exact same workarounds are needed. Actually, this isn't the same workarounds on my "something else that is not HD 6000 but has UVD3". I just can't assume anything without someone trying for real there.

**HokTar** · 24 October 2010, 07:38 AM

@ bridgman & gbeauche
So what is the conclusion? Would va-api be a better choice as our "shiny new state tracker"? Or the differences are minor so it could be the individual developer's call?

(It will be anyways, but you know, in theory.)

**tball** · 24 October 2010, 07:52 AM

Anyone know where to contact K?nig? I think it would be sad creating a lot of duplicate work. We are currently tree developers looking into a shader based decoder via gallium3d.

Announcement

There May Still Be Hope For R600g Supporting XvMC, VDPAU

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment