Announcement

**bridgman** · 05 January 2009, 02:31 PM

It depends on the question

The ideal solution is to have access to the dedicated hardware we all put in our chips to support BluRay playback. Unfortunately that hardware is also wrapped up in bad scary DRM stuff so making it available on Linux is a slow and painful process.

In the meantime, a lot of the computationally expensive work associated with H.264 decode and encode can be done with shaders, which is where OpenCL comes in. It doesn't *have* to be OpenCL, and the work doesn't have to wait for OpenCL, we're just saying that a year from now anyone writing that kind of code will probably start with OpenCL.

In the meantime, I believe there is enough info publicly available today to write a GPU-accelerated H.264 decoder for Linux on either Intel or ATI hardware using the 3D engine using conventional shaders. Another option for ATI and NVidia hardware would be to write the decoder using Stream or Cuda tools (but without that handy library).

One thing that tools like OpenCL will do is make it possible for more people to get into this kind of programming, without having to first take the plunge into driver development. The CUDA and Stream tools certainly help, but I think OpenCL will help more.

The other obvious benefit of OpenCL is the ability to run the same program on hardware from different vendors, which I guess you could say is "bad for the hardware vendors individually but good for them collectively"

**RealNC** · 05 January 2009, 04:03 PM

Originally posted by bridgman View Post

It depends on the question

The ideal solution is to have access to the dedicated hardware we all put in our chips to support BluRay playback. Unfortunately that hardware is also wrapped up in bad scary DRM stuff so making it available on Linux is a slow and painful process.

You aren't making it available on Windows either :P CoreAVC is going to support CUDA. Was has ATI to offer here? They say "DXVA sucks" and that "it's a failure".

**bridgman** · 05 January 2009, 04:21 PM

Huh ? AFAIK we were the first to ship full GPU offload for HD-DVD/BluRay on Windows, and most of the reviews seem to think we still have a better implementation today.

If you read the rest of the thread where BetaBoy said "DXVA sucks and must die" it seems they aren't really planning to use CUDA itself as much as the library which NVidia made available for CUDA developers (the one popper mentioned a couple of days ago). I think the attraction of the library is that it makes it easy to retrieve the decoded frame, while most of the decoder implementations supplied by HW vendors tend to only output to the screen simply because that was the main requirement.

We make a similar capability available to ISVs :

http://www.cyberlink.com/eng/press_room/view_1756.html

I suspect the library uses the DXVA framework in the NVidia drivers, so having DXVA die might be a bit inconvenient, but that's just a guess

**asun** · 08 January 2009, 03:32 AM

Another vote for offloading H.264 to GPU. Also another datapoint here:
AMD X2 5000+
Radeon HD2400 Pro
2GB DDR2 Ram
1920x1080 display

Hi bitrate H.264 video files would cause stutter and out-of-sync audio. No problem at all with MPEG2 or other MPEG4 codecs (divx, xvid, etc). Using top, it shows that mplayer is taking up all the CPU cycles.

**bridgman** · 08 January 2009, 03:41 AM

asun, are you playing through Xv, OpenGL or X11 output ?

**Kjella** · 08 January 2009, 09:36 AM

Originally posted by bridgman View Post

In the meantime, a lot of the computationally expensive work associated with H.264 decode and encode can be done with shaders, (...)

This might be a very short question with a very long and complicated answer, but: What shader power is required to match the custom hardware?

I did follow some attempts to implement generic shader decoding this summer like here: http://www.bitblit.org/gsoc/g3dvl/ - the other was Rudd on the XMBC team but that didn't seem to go anywhere and the g3dvl project was only able to do 854x480 in realtime.

Surely AMD looked a bit into whether regular shaders could do the work as they decided to go for dedicated hardware, so is it realistic to do Blu-Ray streams on shaders or is that just exaggerating the power of shaders?

It's a also a question of what class of chips can do this - like would it be something for an integrated chip with 10 shaders or only something for a 48xx class card? Some ballpark idea of that would be nice.

**bridgman** · 08 January 2009, 11:24 AM

Yep, the answer is long and complicated

First off, let's get one thing clear. Implementing decode on shaders is not a complete substitute for dedicated hardware, but it is more flexible (fixed function hardware is picky about encoding details) and should be able to reduce CPU utilization enough to make a lot more systems able to decode in real time.

There are a bunch of activities in h.264 decode (bitstream parsing, entropy decode, spatial prediction) which don't lend themselves to being implemented on shaders so that work is going to have to stay on the CPU anyways. Fixed function hardware can handle the entire decode operation and use less power when doing the decoding.

In terms of hardware required, quick answer is "nobody knows for sure until the code is written". I doubt that the 40-ALU parts (HD2400, HD34xx, 780) will have enough power since the 3D engine is also being used for render accel (colour space conversion, scaling, deinterlacing etc..). I have always suggested that anyone wanting to run the open source drivers go with at least a 120-ALU part (2600, 3650 etc..) to have some shader power left over for decode work.

Again, this is all hypothetical right now anyways. I am just trying to give everyone an idea of what the likely scenarios are -- we are going to look into opening up UVD, I just can't make any commitments until we have actually gone through the investigation and it won't be quick. We have 6xx/7xx 3d code out now, so IMO the next priority should be basic power management.

**MU_Engineer** · 08 January 2009, 01:25 PM

Originally posted by bridgman View Post

In terms of hardware required, quick answer is "nobody knows for sure until the code is written". I doubt that the 40-ALU parts (HD2400, HD34xx, 780) will have enough power since the 3D engine is also being used for render accel (colour space conversion, scaling, deinterlacing etc..). I have always suggested that anyone wanting to run the open source drivers go with at least a 120-ALU part (2600, 3650 etc..) to have some shader power left over for decode work.

What about R5xx parts? Are the shaders on those units usable for shader-assisted decode?

**bridgman** · 08 January 2009, 01:48 PM

Yep, there's nothing special about the 6xx/7xx shaders in that regard. That's one of the interesting things about a shader-based implementation -- it can't accelerate as much as dedicated hardware, but it can work on GPUs which don't *have* dedicated hardware. You would probably need a fairly high end card though -- the rv530 was the first time we started cranking up the ALU:TEX and ALU:ROP ratio (partly for more shader-intensive games, and partly for video processing), and you probably would need X8xx, X18xx or X19xx realistically.

Again, until something is implemented these are all SWAGs.

**popper** · 08 January 2009, 08:35 PM

"we are going to look into opening up UVD, I just can't make any commitments until we have actually gone through the investigation and it won't be quick. We have 6xx/7xx 3d code out now, so IMO the next priority should be basic power management. "

thats a shame, we are looking at months at the very least then!

"I think the attraction of the [NV cuda] library is that it makes it easy to retrieve the decoded frame, while most of the decoder implementations supplied by HW vendors tend to only output to the screen simply because that was the main requirement.

We make a similar capability available to ISVs :

CyberLink PowerDVD and PowerDirector Support ATI Radeon™ HD 4800 Graphics Series from AMD, Delivering Optimized HD Playback and Accelerated Video Transcoding | CyberLink

http://www.cyberlink.com/eng/press_room/view_1756.html

"

yep,that about covers it for basic needs it appears, your average dev and indeed Pro coders such as BetaBoy and the CoreAVC coders dont really need that much help once they have the right library and docs access it seems, BetaBoy said he wanted to support ATI UVD in CoreAVC and related apps but you dont give them or the open SW coders access to the ATI UVD.

"I suspect the library uses the DXVA framework in the NVidia drivers, so having DXVA die might be a bit inconvenient, but that's just a guess "

i think its just entry points into and out of the generic DSP "blackbox" they put on their cards/SOC chips TBO.....

i dont really see why ATI/AMD couldnt also make such as "blackbox" UVD available as a stop gap measure to help multi OS devs in the short term TBO...!

i dont know why (other than saving pennys they could recoup on the retail cost) you HW vendors dont just move away from these antiquated DSP SOC, and start using current faster and vastly more expandable FPGA's for your UVD, you (or indeed anyone) could then simply re-program on the fly for many other HW assisted tasks and market sectors?.

imagine the open source and even closed add-on FPGA (Field Programable Gate Array)code you could market and sell into a generic mass markets.

simply taking the chance and putting a current fast,low power FPGA on every ATI/AMD gfx and related card/MB would bring the worlds FPGA prices right down in line or below the cheap DSPs favoured today perhaps....fostering lots of innovative cost effective uses in the near/long term for everyones benefit.

and there wouldnt be any problems as regards DRM layden code.

given the apparent potential long wait for anything ATI UVD related, perhaps its finally time to move over to NV cards for now as the only viable option for many people world wide today!, as CoreAVC have a linux library available and have released test HW assisted cuda/VS2 CoreAVC on windows that apparently gives it a massive (x2-x4) decoding boost, i dont know if it will be usable on linux X86 as yet though.

Announcement

XvMC support

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment