Announcement

**nanonyme** · 29 August 2009, 10:53 AM

Originally posted by lbcoder View Post

I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.

No decoding acceleration or motion compensation for the moment. Probably won't be until Gallium3D.

**bridgman** · 29 August 2009, 10:54 AM

First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.

**nanonyme** · 29 August 2009, 12:16 PM

Originally posted by bridgman View Post

I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API)

Well, I meant XvMC actually with the MC I mentioned earlier. I got the impression there already is a state tracker for it in Gallium3D and we just need a working r600g driver to tap into it. ^^ Actual decoding acceleration over VA-API or whatever else will require quite a lot more work though...

**lbcoder** · 30 August 2009, 10:10 AM

Definitely I'm using Xv. Not a newb here

I assume that by "multithreaded implementations of the current decode stack" you are referring to ffmpeg-mp. I have had a look at that, and it did help, but at this point, I've had to resort to dropping the $15 for a coreavc license. With that its still struggling, but at least the video is watchable.

I have to admit that most of your post went way over my head. I am a computer engineer myself, but no experience at all in graphics driver or video processing development. From what I can gather though, seems to me that there is a while to wait yet.

Thank you for your response.

Originally posted by bridgman View Post

First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.

**bridgman** · 18 September 2009, 01:22 PM

Originally posted by myxal View Post

Are you saying we might see UVD, i.e. bitstream acceleration in opensource drivers?

I'm saying "we don't know yet, so assume the answer is no unless/until you hear otherwise". In the meantime, decode acceleration with shaders is moving ahead. Even if we opened up UVD tomorrow we would need shader-based decode acceleration anyways, since only the more recent GPUs (everything after the original HD2900) include the UVD block.

Originally posted by myxal View Post

I recall there being some limitations on XvMC. Going straight to what I care about and need the stack to provide (note: according to reports on the web, VDPAU with nvidia does this): Postprocessing of the decoded video frames, needed to support current mplayer's implementation of subtitles, OSD, etc. Does XvMC even allow this?

I'm pretty sure that all of those features existed before VDPAU came along, and that code exists to implement them using existing APIs such as OpenGL.

XvMC has all kinds of limitations including being designed around MPEG-2 standards -- the reason for doing XvMC first is simply because a lot of the code is already there. This allows the developers to concentrate on getting a Gallium3D driver working to complete the stack. Once XvMC-over-Gallium3D is running the GPU-specific work will be largely done, and support for other APIs and video standards will be much easier to add.

Originally posted by myxal View Post

The fad now is mobility - how does the power draw compare when using UVD and when using shaders? Well the library is a wrapper for various implementations and we already know nvidia's implementation (mostly) works. We're just THAT eager to see other implementations, working with hardware unaffected by Bumpgate

The quick answer is "we'll know for sure when the code is written", but I expect shader-based decode will use more power and CPU than UVD-based decode. The important question is whether it will use enough extra power to really matter for most users, and I suspect the answer is "no".

Originally posted by m4rgin4l View Post

You make a good point here. We shouldn't spend more than 50 bucks if all you want is to watch HD content. I think the problem is with people that spent 150 or more and want to get the most out of their hardware.

I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.

The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.

**lbcoder** · 18 September 2009, 02:26 PM

Originally posted by bridgman View Post

I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.

El-super-cheapo HD3650 anyone? http://www.newegg.com/Product/Produc...lor-_-14131084

**nanonyme** · 18 September 2009, 02:47 PM

Originally posted by bridgman View Post

The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.

Out of interest: how does rv670 compare with this?

**bridgman** · 18 September 2009, 03:41 PM

rv610/620 - 40 ALUs (2 SIMDs x 4 pixels/vertices per SIMD x 5)
rv710 - 80 ALUs (2 SIMDs x 8 pixels/vertices per SIMD x 5)
rv630/635 - 120 ALUs (3 SIMDs x 8 pixels/vertices per SIMD x 5)
rv670 - 320 ALUs (4 SIMDs x 16 pixels/vertices per SIMD x 5)
rv730 - 320 ALUs (8 SIMDs x 8 pixels/vertices per SIMD x 5)
rv740 - 640 ALUs (8 SIMDs x 16 pixels/vertices per SIMD x 5)
rv770 - 800 ALUs (10 SIMDs x 16 pixels/vertices per SIMD x 5)

No problem

**DanL** · 18 September 2009, 04:01 PM

Originally posted by lbcoder View Post

El-super-cheapo HD3650 anyone? http://www.newegg.com/Product/Produc...lor-_-14131084

If I was going to spend ~$50 and get an ATI card with a tiny, whiny, 40mm fan, I would move up to the RV730 class: RadeonHD 4650 - http://www.newegg.com/Product/Produc...82E16814102829

Personally, I opted for an RV710/4550 with passive cooling because I'm a neurotic lover of silent computing (and not much of a gamer).

Announcement

radeor video acceleration

radeor video acceleration

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment