I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.
Announcement
Collapse
No announcement yet.
radeor video acceleration
Collapse
X
-
Originally posted by lbcoder View PostI would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.
-
First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.
Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :
1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.
2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.
It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.
Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.
The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.
HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.Last edited by bridgman; 29 August 2009, 11:57 AM.Test signature
Comment
-
Originally posted by bridgman View PostI haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API)
Comment
-
Definitely I'm using Xv. Not a newb here
I assume that by "multithreaded implementations of the current decode stack" you are referring to ffmpeg-mp. I have had a look at that, and it did help, but at this point, I've had to resort to dropping the $15 for a coreavc license. With that its still struggling, but at least the video is watchable.
I have to admit that most of your post went way over my head. I am a computer engineer myself, but no experience at all in graphics driver or video processing development. From what I can gather though, seems to me that there is a while to wait yet.
Thank you for your response.
Originally posted by bridgman View PostFirst, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.
Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :
1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.
2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.
It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.
Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.
The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.
HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.
Comment
-
Originally posted by myxal View PostAre you saying we might see UVD, i.e. bitstream acceleration in opensource drivers?
Originally posted by myxal View PostI recall there being some limitations on XvMC. Going straight to what I care about and need the stack to provide (note: according to reports on the web, VDPAU with nvidia does this): Postprocessing of the decoded video frames, needed to support current mplayer's implementation of subtitles, OSD, etc. Does XvMC even allow this?
XvMC has all kinds of limitations including being designed around MPEG-2 standards -- the reason for doing XvMC first is simply because a lot of the code is already there. This allows the developers to concentrate on getting a Gallium3D driver working to complete the stack. Once XvMC-over-Gallium3D is running the GPU-specific work will be largely done, and support for other APIs and video standards will be much easier to add.
Originally posted by myxal View PostThe fad now is mobility - how does the power draw compare when using UVD and when using shaders? Well the library is a wrapper for various implementations and we already know nvidia's implementation (mostly) works. We're just THAT eager to see other implementations, working with hardware unaffected by Bumpgate
Originally posted by m4rgin4l View PostYou make a good point here. We shouldn't spend more than 50 bucks if all you want is to watch HD content. I think the problem is with people that spent 150 or more and want to get the most out of their hardware.
The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.Last edited by bridgman; 18 September 2009, 03:59 PM.Test signature
Comment
-
Originally posted by bridgman View PostI have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.
Comment
-
rv610/620 - 40 ALUs (2 SIMDs x 4 pixels/vertices per SIMD x 5)
rv710 - 80 ALUs (2 SIMDs x 8 pixels/vertices per SIMD x 5)
rv630/635 - 120 ALUs (3 SIMDs x 8 pixels/vertices per SIMD x 5)
rv670 - 320 ALUs (4 SIMDs x 16 pixels/vertices per SIMD x 5)
rv730 - 320 ALUs (8 SIMDs x 8 pixels/vertices per SIMD x 5)
rv740 - 640 ALUs (8 SIMDs x 16 pixels/vertices per SIMD x 5)
rv770 - 800 ALUs (10 SIMDs x 16 pixels/vertices per SIMD x 5)
No problemLast edited by bridgman; 18 September 2009, 03:55 PM.Test signature
Comment
-
Originally posted by lbcoder View PostEl-super-cheapo HD3650 anyone? http://www.newegg.com/Product/Produc...lor-_-14131084
Personally, I opted for an RV710/4550 with passive cooling because I'm a neurotic lover of silent computing (and not much of a gamer).
Comment
Comment