Announcement

**RealNC** · 03 April 2011, 10:10 PM

This whole thing is kind of useless. VP8 on the internet is low-bitrate enough as to not need acceleration. H.264 would be much more important to have on top of Gallium.

**gbeauche** · 04 April 2011, 12:24 AM

Originally posted by popper View Post

your ffmpeg patch set just got posted to libav and they have some problem by the look of it ?

http://lists.libav.org/pipermail/libav-devel/2011-April/001240.html

you might want to pop over there or on their IRC if you cant be bothered to follow subscribe there

I follow this list, but in quick read-only mode for now. If they have problems with the vaapi configure parts that are posted, this means they also have problems with the vdpau configure parts that are already in... Otherwise, they are just looking at non problems.

**gbeauche** · 04 April 2011, 12:31 AM

Originally posted by RealNC View Post

This whole thing is kind of useless. VP8 on the internet is low-bitrate enough as to not need acceleration. H.264 would be much more important to have on top of Gallium.

This was due to the current lack of HW acceleration at the moment. Google is looking into people to adopt VP8 HW decoding because even current ARM Cortex A9 chips can't handle 1080p VP8 contents. And this is worse when some chips (e.g. Tegra) don't implement NEON extensions to get help. This will change pretty soon, Rockchip already demoed a chip that supports HW VP8 decode. For desktop (PC) GPUs, you still have to wait for N+1 or N+2 generation, depending on the manufacturer. Then, higher bitrate (and quality to begin with) VP8 contents can appear.

**elanthis** · 04 April 2011, 01:31 AM

Originally posted by gbeauche View Post

And this is worse when some chips (e.g. Tegra) don't implement NEON extensions to get help.

Totally off topic, but not being an embedded developer guy at all, when I first heard that I didn't believe it. What the hell went through the NVIDIA engineers' minds to make them think that leaving off the SIMD engine for any CPU meant for anything more complex than a toaster is in any way acceptable?

This is why I don't fear mobile devices taking over the PC or game console world. (Maybe the mobile game console world is in trouble, sure, but then only because Nintendo and Sony both are utter morons and either screw over every third-party developer they can or just design whacked out expensive hardware with gimmicky features but only 1/30th the power of my two year old phone.)

**gbeauche** · 04 April 2011, 02:25 AM

Originally posted by elanthis View Post

Totally off topic, but not being an embedded developer guy at all, when I first heard that I didn't believe it. What the hell went through the NVIDIA engineers' minds to make them think that leaving off the SIMD engine for any CPU meant for anything more complex than a toaster is in any way acceptable?

This would have made for a larger die (larger -> more transistors -> more heat) and they probably thought people would use CUDA for multimedia kernels instead. Next-generation Tegra chips will integrate this extension though, IIRC.

**runeks** · 15 April 2011, 04:01 PM

Originally posted by RealNC View Post

This whole thing is kind of useless. VP8 on the internet is low-bitrate enough as to not need acceleration. H.264 would be much more important to have on top of Gallium.

I think the point Emeric made was that it just wouldn't be possible for him to make a H.264 decoder state tracker with the time he has.
So he would prefer ending up with a less usable but functional (VP8) decoder, rather than a more useful (H.264) but non-functional/incomplete decoder.
Implementing a full H.264 decoder in software took him six months last time he tried (1), and this time he has to learn about shader-based optimizations as well, so it would seem to be a bit too much work for one GSoC.

Also, he first started out his thread on the mailing list by proposing a generic implementation of various processes involved in video decoding, so that an arbitrary codec could just hook into these and accelerate decoding this way. But I'm not sure if he's left that idea as well:

The project would be to write a state tracker wich expose some of the
most shaders-friendly decoding operations (like motion compensation,
idct, intra-predictions, deblocking filter and maybe vlc decoding)
through a common API like VDPAU or VA-API.
These APIs can be used to decode mpeg2, mpeg 4 asp/avc, vc1 and
others, but at first I intend to focus on the h264 decoding to save
time, because I know it better and it is currently widely in use, but
again the goal of the project is to be generic.

(2)

In any case, if he successfully makes this VDPAU VP8 decoder state tracker, adding support for H.264, VC-1 etc. later will be much easier than it is now.

**runeks** · 16 April 2011, 04:24 PM

Can anyone answer exactly how these optimizations are coded in a state tracker?
I mean, I think I get how the state tracker itself functions, on a conceptual level, at least.
But in the state tracker code, in the part of the code that deals with the actual decoding of a video stream, how, concretely, is a certain part of this decoding code - let's say iDCT - written, to allow it to be executed on a GPU (in parallel)?
Would the process of starting to write useful code for this kind of thing be something like reading a couple of papers on parallelizing the iDCT algorithm, and then write the actual paralllelization-code in TGSI? Or does one use a higher level language like GLSL? Are there any currently working code examples in mesa that I can take a look at to get a better understanding of it?

More generally perhaps, is the point of access to the shaders of a graphics card in a Gallium state tracker always TGSI? If this is the case, wouldn't something as relatively easy as iDCT be fairly complicated to implement in TGSI? I mean, I've seen the C code, and the assembly code, that implements iDCT. Wouldn't the TGSI-code look a lot like the assembly (CPU) code except with some form a parallelization-enabled instructions?

On a third note, does anyone know where the "TGSI specification" pdf on this site has gone? I'd really like to take a look at it (even though I probably wouldn't understand much of it

). But it seems like it's the only documentation that I can find.

**smitty3268** · 16 April 2011, 04:34 PM

Originally posted by runeks View Post

Can anyone answer exactly how these optimizations are coded in a state tracker?
I mean, I think I get how the state tracker itself functions, on a conceptual level, at least.
But in the state tracker code, in the part of the code that deals with the actual decoding of a video stream, how, concretely, is a certain part of this decoding code - let's say iDCT - written, to allow it to be executed on a GPU (in parallel)?
Would the process of starting to write useful code for this kind of thing be something like reading a couple of papers on parallelizing the iDCT algorithm, and then write the actual paralllelization-code in TGSI? Or does one use a higher level language like GLSL? Are there any currently working code examples in mesa that I can take a look at to get a better understanding of it?

More generally perhaps, is the point of access to the shaders of a graphics card in a Gallium state tracker always TGSI? If this is the case, wouldn't something as relatively easy as iDCT be fairly complicated to implement in TGSI? I mean, I've seen the C code, and the assembly code, that implements iDCT. Wouldn't the TGSI-code look a lot like the assembly (CPU) code except with some form a parallelization-enabled instructions?

On a third note, does anyone know where the "TGSI specification" pdf on this site has gone? I'd really like to take a look at it (even though I probably wouldn't understand much of it

). But it seems like it's the only documentation that I can find.

In broad strokes, the job of the state tracker is to convert random API input (VDPAU, OpenGL, etc.) into TGSI output.

Then the hardware drivers take the TGSI as input and output commands that the actual hardware works with.

I suspect that the video decoding code will be written directly in TGSI within the state tracker, but I suppose it's possible to do it in something like GLSL and then compile it down to TGSI. I'm not sure how difficult that would be to implement, but it's probably more efficient to just code in TGSI directly.

**smitty3268** · 16 April 2011, 04:47 PM

Originally posted by smitty3268 View Post

I suspect that the video decoding code will be written directly in TGSI within the state tracker, but I suppose it's possible to do it in something like GLSL and then compile it down to TGSI. I'm not sure how difficult that would be to implement, but it's probably more efficient to just code in TGSI directly.

I'm guessing what will be done is:

1. Add the state tracker but do everything in C code, test until it works
2. Pick a function like idtc and write a separate test app with a shader implementing it
3. Test that shader until it works well, then use Mesa to record the TGSI code it generates
4. Move that generated code into the state tracker, test
5. Either move on to the next function to optimize, or work on optimizing the generated TGSI code directly in the state tracker. Repeat as needed.

**runeks** · 16 April 2011, 05:20 PM

Originally posted by smitty3268 View Post

2. Pick a function like idtc and write a separate test app with a shader implementing it
3. Test that shader until it works well, then use Mesa to record the TGSI code it generates

I guess what I'm interested in is these two steps. Perhaps only step 2.
I'm not sure what you mean by "write a separate test app with a shader implementing it" though. Why would we write a separate (test) application to implement a sub-feature of a state tracker? Or do you mean just writing an application that can be used to test whichever decoding routine we choose to optimize using shaders?

Also, in step 3: are we not writing this shader in TGSI ourselves? If so, why would we use mesa to record "the TGSI code it generates"?

Announcement

VP8 Over VDPAU In Gallium3D Is Emeric's Target

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment