XvMC support

MU_Engineer replied

08 January 2009, 01:25 PM
Originally posted by bridgman View Post

In terms of hardware required, quick answer is "nobody knows for sure until the code is written". I doubt that the 40-ALU parts (HD2400, HD34xx, 780) will have enough power since the 3D engine is also being used for render accel (colour space conversion, scaling, deinterlacing etc..). I have always suggested that anyone wanting to run the open source drivers go with at least a 120-ALU part (2600, 3650 etc..) to have some shader power left over for decode work.

What about R5xx parts? Are the shaders on those units usable for shader-assisted decode?
Leave a comment:
bridgman replied

08 January 2009, 11:24 AM
Yep, the answer is long and complicated

First off, let's get one thing clear. Implementing decode on shaders is not a complete substitute for dedicated hardware, but it is more flexible (fixed function hardware is picky about encoding details) and should be able to reduce CPU utilization enough to make a lot more systems able to decode in real time.

There are a bunch of activities in h.264 decode (bitstream parsing, entropy decode, spatial prediction) which don't lend themselves to being implemented on shaders so that work is going to have to stay on the CPU anyways. Fixed function hardware can handle the entire decode operation and use less power when doing the decoding.

In terms of hardware required, quick answer is "nobody knows for sure until the code is written". I doubt that the 40-ALU parts (HD2400, HD34xx, 780) will have enough power since the 3D engine is also being used for render accel (colour space conversion, scaling, deinterlacing etc..). I have always suggested that anyone wanting to run the open source drivers go with at least a 120-ALU part (2600, 3650 etc..) to have some shader power left over for decode work.

Again, this is all hypothetical right now anyways. I am just trying to give everyone an idea of what the likely scenarios are -- we are going to look into opening up UVD, I just can't make any commitments until we have actually gone through the investigation and it won't be quick. We have 6xx/7xx 3d code out now, so IMO the next priority should be basic power management.
Leave a comment:
Kjella replied

08 January 2009, 09:36 AM
Originally posted by bridgman View Post

In the meantime, a lot of the computationally expensive work associated with H.264 decode and encode can be done with shaders, (...)

This might be a very short question with a very long and complicated answer, but: What shader power is required to match the custom hardware?

I did follow some attempts to implement generic shader decoding this summer like here: http://www.bitblit.org/gsoc/g3dvl/ - the other was Rudd on the XMBC team but that didn't seem to go anywhere and the g3dvl project was only able to do 854x480 in realtime.

Surely AMD looked a bit into whether regular shaders could do the work as they decided to go for dedicated hardware, so is it realistic to do Blu-Ray streams on shaders or is that just exaggerating the power of shaders?

It's a also a question of what class of chips can do this - like would it be something for an integrated chip with 10 shaders or only something for a 48xx class card? Some ballpark idea of that would be nice.
Leave a comment:
bridgman replied

08 January 2009, 03:41 AM
asun, are you playing through Xv, OpenGL or X11 output ?
Leave a comment:
asun replied

08 January 2009, 03:32 AM
Another vote for offloading H.264 to GPU. Also another datapoint here:
AMD X2 5000+
Radeon HD2400 Pro
2GB DDR2 Ram
1920x1080 display

Hi bitrate H.264 video files would cause stutter and out-of-sync audio. No problem at all with MPEG2 or other MPEG4 codecs (divx, xvid, etc). Using top, it shows that mplayer is taking up all the CPU cycles.
Leave a comment:
bridgman replied

05 January 2009, 04:21 PM
Huh ? AFAIK we were the first to ship full GPU offload for HD-DVD/BluRay on Windows, and most of the reviews seem to think we still have a better implementation today.

If you read the rest of the thread where BetaBoy said "DXVA sucks and must die" it seems they aren't really planning to use CUDA itself as much as the library which NVidia made available for CUDA developers (the one popper mentioned a couple of days ago). I think the attraction of the library is that it makes it easy to retrieve the decoded frame, while most of the decoder implementations supplied by HW vendors tend to only output to the screen simply because that was the main requirement.

We make a similar capability available to ISVs :

http://www.cyberlink.com/eng/press_room/view_1756.html

I suspect the library uses the DXVA framework in the NVidia drivers, so having DXVA die might be a bit inconvenient, but that's just a guess

Last edited by bridgman; 05 January 2009, 04:26 PM.
Leave a comment:
RealNC replied

05 January 2009, 04:03 PM
Originally posted by bridgman View Post

It depends on the question

The ideal solution is to have access to the dedicated hardware we all put in our chips to support BluRay playback. Unfortunately that hardware is also wrapped up in bad scary DRM stuff so making it available on Linux is a slow and painful process.

You aren't making it available on Windows either :P CoreAVC is going to support CUDA. Was has ATI to offer here? They say "DXVA sucks" and that "it's a failure".
Leave a comment:
bridgman replied

05 January 2009, 02:31 PM
It depends on the question

The ideal solution is to have access to the dedicated hardware we all put in our chips to support BluRay playback. Unfortunately that hardware is also wrapped up in bad scary DRM stuff so making it available on Linux is a slow and painful process.

In the meantime, a lot of the computationally expensive work associated with H.264 decode and encode can be done with shaders, which is where OpenCL comes in. It doesn't *have* to be OpenCL, and the work doesn't have to wait for OpenCL, we're just saying that a year from now anyone writing that kind of code will probably start with OpenCL.

In the meantime, I believe there is enough info publicly available today to write a GPU-accelerated H.264 decoder for Linux on either Intel or ATI hardware using the 3D engine using conventional shaders. Another option for ATI and NVidia hardware would be to write the decoder using Stream or Cuda tools (but without that handy library).

One thing that tools like OpenCL will do is make it possible for more people to get into this kind of programming, without having to first take the plunge into driver development. The CUDA and Stream tools certainly help, but I think OpenCL will help more.

The other obvious benefit of OpenCL is the ability to run the same program on hardware from different vendors, which I guess you could say is "bad for the hardware vendors individually but good for them collectively"

Last edited by bridgman; 05 January 2009, 03:17 PM.
Leave a comment:
RealNC replied

05 January 2009, 02:10 PM
Originally posted by deneb View Post

CoreAVC is still only using the dedicated VP2/3 decoder through NVCUVID API in CUDA, just like Donald Graft's DGAVCDecNV. Most videos will be supported since NVIDIA's H.264 decoder is quite flexible

Hmm. So then OpenCL is not the answer here? Then what is?
Leave a comment:
bridgman replied

05 January 2009, 01:50 PM
Nope, it's a good question. There are ways you can deal with dependencies and data communication between parallel threads, but in general if you rely on them much you lose most of the benefit of parallism.

This is why converting from a CPU program to a GPU program is not easy; you often need to come up with a new approach to solving the problem which does not require those dependencies. Some problems (eg picking apart a sequential bitstream) don't lend themselves to parallel processing at all.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: