Announcement

**SavageX** · 04 February 2008, 03:35 AM

Personally I guess the very reason to have a special (and costly) hardware unit for video decoding is DRM. My naive understanding leads me to believe that most video decoding steps can be done in the 3D shaders anyway and that only the need to keep the encryption chain intact "justifies" dedicated hardware.

Perhaps it's viable to in worst case not touch the UVD unit at all and do most stuff with a set of shaders anyway? This would also be rather flexible (yum yum, accelerated Theora and/or Dirac...) ;-)

Downside may be that lower-end parts may not have the required shader-power and that energy-efficiency may not be as great.

**givemesugarr** · 04 February 2008, 05:42 AM

Originally posted by SavageX View Post

Personally I guess the very reason to have a special (and costly) hardware unit for video decoding is DRM. My naive understanding leads me to believe that most video decoding steps can be done in the 3D shaders anyway and that only the need to keep the encryption chain intact "justifies" dedicated hardware.

this is not true. hw decoding may be done without drm and encryption support. the ati choice was to do both the decoding and protection in the same block since it should be less painful engineering it.

Originally posted by SavageX View Post

Perhaps it's viable to in worst case not touch the UVD unit at all and do most stuff with a set of shaders anyway? This would also be rather flexible (yum yum, accelerated Theora and/or Dirac...) ;-)

Downside may be that lower-end parts may not have the required shader-power and that energy-efficiency may not be as great.

doing the stuff in the shaders is the same as not having hw decode and for this reason adding that block would be meaningless. the purpose of hw decoding can be illustrated with the analog/digital andn digital/analog conversion.

when you transform the analog into digital you do some stuff on the signal and generate a train of impulses that you'll add to a carrier frequency. at the receiver side you'll have to recover the train of impulses and then apply a digital to analog conversion. if this conversion is done by software means it would take, let's say, 160 miliseconds. if this d/a conversion is done by a hw block this conversion could be done in the time of the signal passing through the block, let's suppose less than 100ms (remember that this time is very huge for this type of cenversion). then we have to think of another aspect: in the first type of conversion (sw one) we will need a powerful processor to do the conversion. the less powerful the processor the longer the time to convert the signal. in the second case instead we will not need a processor to elaborate the signal, but only a sync clock (that is present also in the first case).
the same procedure applies to hw decoding blocks. if you use them you'll have the following gains:
1. no central cpu stress
2. no gpu stress
3. no extra ram stress
4. only dma stress

without this you'll have this condition:
1. cpu or gpu stress (having a combined cpu/gpu work on decoding is still not done for what i know since the 2 types of processors are different and the process of using both would totally kill the system bus)
2. extra ram needed for computational purpose both on videocard and on system one
3. dma controller stress.

as you see there are many thing that would make hw video decoding to be inserted into the videocards.

**SavageX** · 04 February 2008, 06:18 AM

Originally posted by givemesugarr View Post

this is not true. hw decoding may be done without drm and encryption support. the ati choice was to do both the decoding and protection in the same block since it should be less painful engineering it.

Of course can HW decoding be done without DRM. What I'm saying is that if there weren't those hard "security" requirements the GPU developers might consider doing it all in the (now general purpose) shader core.

Originally posted by givemesugarr View Post

doing the stuff in the shaders is the same as not having hw decode and for this reason adding that block would be meaningless. the purpose of hw decoding can be illustrated with the analog/digital andn digital/analog conversion.
<snip>

The video decoding hardware is not a collection of passive elements. Your example doesn't apply IMO.

Dedicated hardware can of course be more (energy) efficient than a programmable shader engine, though.

Originally posted by givemesugarr View Post

2. extra ram needed for computational purpose both on videocard and on system one

I'd say there's plenty of it available. The gfx card needs to hold the input data, the shader programs and a buffer for the results. Nothing too worrying.

Originally posted by givemesugarr View Post

3. dma controller stress.

How that? Even with dedicated decoding hardware you need to get the input data in and the output data out.

Originally posted by givemesugarr View Post

as you see there are many thing that would make hw video decoding to be inserted into the videocards.

Sure, having dedicated HW is always nice. All I'm saying is that doing it with the 3D shaders may be an acceptable fallback which would still be way more efficient than doing it all on the CPU.

**givemesugarr** · 04 February 2008, 06:50 AM

Originally posted by SavageX View Post

Of course can HW decoding be done without DRM. What I'm saying is that if there weren't those hard "security" requirements the GPU developers might consider doing it all in the (now general purpose) shader core.

that won't be anymore hw decoding.

Originally posted by SavageX View Post

The video decoding hardware is not a collection of passive elements. Your example doesn't apply IMO.

neither digital/analog is even if the process may seem to.

Originally posted by SavageX View Post

I'd say there's plenty of it available. The gfx card needs to hold the input data, the shader programs and a buffer for the results. Nothing too worrying.

this is the problem with young people. they haven't seen what it means to not have enough ram to do what you want. and i assure you that physical ram isn't never enough. that's why you continue to use swap (very slow when compared to physical ram), because there's simply not enough ram. if you start to run some apps you'll see how faster the system ram would go up and fill.

Originally posted by SavageX View Post

How that? Even with dedicated decoding hardware you need to get the input data in and the output data out.

that's the point. the dma is always stressed but with hw decoding you won't stress other components as gpu/cpu system ram for more than the minimum required.

Originally posted by SavageX View Post

Sure, having dedicated HW is always nice. All I'm saying is that doing it with the 3D shaders may be an acceptable fallback which would still be way more efficient than doing it all on the CPU.

you're only moving the decoding from the cpu to the gpu. to do this you need to simulate a hw decoding unit on the shader engine. now this would mean another layer between the hw and the software parts. i do not agree with this solution since, in my opinion is just time losing one. devs would loose time to implement this solution that would not perform as the hw one, instead of focusing on developing code that would make the hw decode block function well. this method could be used for simulated hw decoding of new formats for older boards. otherwise i don't see any gain in using it.

**SavageX** · 04 February 2008, 08:17 AM

Originally posted by givemesugarr View Post

that won't be anymore hw decoding.

Aye.

Originally posted by givemesugarr View Post

this is the problem with young people. they haven't seen what it means to not have enough ram to do what you want. and i assure you that physical ram isn't never enough. that's why you continue to use swap (very slow when compared to physical ram), because there's simply not enough ram. if you start to run some apps you'll see how faster the system ram would go up and fill.

Whenever HD video decoding fails on PCs it's basically never because of memory constraints. Decoding video won't drive your system into swapping (as long as no mem leaks happen).

Oh, and thanks for calling me young.

Originally posted by givemesugarr View Post

you're only moving the decoding from the cpu to the gpu.

Aye. Personally I think that's what GPUs are made for

Originally posted by givemesugarr View Post

to do this you need to simulate a hw decoding unit on the shader engine. now this would mean another layer between the hw and the software parts. i do not agree with this solution since, in my opinion is just time losing one. devs would loose time to implement this solution that would not perform as the hw one, instead of focusing on developing code that would make the hw decode block function well. this method could be used for simulated hw decoding of new formats for older boards. otherwise i don't see any gain in using it.

Nope, of course you wouldn't "simulate" the decoding hardware, you'd directly implement the codec decoding algorithms (or just bottleneck parts) for the stream processors (GPU shaders). That is write a MPEG-2 decoder, a MPEG-4 (ASP and AVC) decoder, ...

Downside is that the driver developers would have to directly mess with the video compression details, upside is the increased flexibility and the possibility to reuse this GPU-assisted decoding on other hardware, too (Nvidia or Intel).

edit: Well, or one would implement generic decoding building blocks... the DCT, MC - and perhaps codec specific helpers for bitstream unpacking etc. etc.

**Svartalf** · 04 February 2008, 10:23 AM

Originally posted by SavageX View Post

Aye.
Whenever HD video decoding fails on PCs it's basically never because of memory constraints. Decoding video won't drive your system into swapping (as long as no mem leaks happen).

The problem is... The card memory is a limited resource, typically, when compared to the memory of your system. You need to use that resource when you're doing this sort of thing. It's not that you couldn't do it, it's that you've got to take into account that someone only has 128Mb of card memory for these sorts of things. Please note: I'm not suggesting that this isn't a desirable thing to do- but if we can get dedicated hardware info instead, it'd be better.

Aye. Personally I think that's what GPUs are made for

Heh...

Nope, of course you wouldn't "simulate" the decoding hardware, you'd directly implement the codec decoding algorithms (or just bottleneck parts) for the stream processors (GPU shaders). That is write a MPEG-2 decoder, a MPEG-4 (ASP and AVC) decoder, ...

Indeed. And it's something I think we should consider, really- but not our sole focus fort this problem we're seeing.

Downside is that the driver developers would have to directly mess with the video compression details, upside is the increased flexibility and the possibility to reuse this GPU-assisted decoding on other hardware, too (Nvidia or Intel).

Considering that SH showed that you could do generic GPU programming, I think you could make a case for doing things like what you're proposing on this instance. The big problem would be understanding how this stuff works- and you kind of need to to be able to drive the dedicated decode paths- it's largely blocks of the same nature as we're talking about and in order to really use them, in a making an API for them context, you need to understand at least a little something about those very details.

**givemesugarr** · 04 February 2008, 10:34 AM

Originally posted by Svartalf View Post

The problem is... The card memory is a limited resource, typically, when compared to the memory of your system. You need to use that resource when you're doing this sort of thing. It's not that you couldn't do it, it's that you've got to take into account that someone only has 128Mb of card memory for these sorts of things. Please note: I'm not suggesting that this isn't a desirable thing to do- but if we can get dedicated hardware info instead, it'd be better.

this applies also to system memory. i appreciate diego peten? work on trying to fix a lot of shared memory stuff. if you follow a little his blog you'd understand about what this guy is trying to do and i completely agree with him on this part. using his ebuilds i've started to increase (by slow ammount for now) the speed of processes and to lower the needed memory. this happens because i use a lot of stuff that is based on similar libs (an example is amarok and kaffeine but there really are a lot more).

Originally posted by Svartalf View Post

Indeed. And it's something I think we should consider, really- but not our sole focus fort this problem we're seeing.

i personally don't really like this solution. to me it seems like a workaround to load the video board instead of loading the core system and bypassing the hw decoding block. in my opinion this solution should only come out when everything in the driver work well and this would only be a cherry over a piece of cake.

Originally posted by Svartalf View Post

Considering that SH showed that you could do generic GPU programming, I think you could make a case for doing things like what you're proposing on this instance. The big problem would be understanding how this stuff works- and you kind of need to to be able to drive the dedicated decode paths- it's largely blocks of the same nature as we're talking about and in order to really use them, in a making an API for them context, you need to understand at least a little something about those very details.

this could be simpler with gallium 3d for what i've understood of the explanations about it. before that generic gpu programming could be useful, but the hw are too much different to be able to use a lot of generic gpu programming.

**curaga** · 05 February 2008, 01:01 PM

I have a BIG question to bridgman:

How come Via has been able to give us a driver with HD decoding?

**givemesugarr** · 05 February 2008, 01:08 PM

Originally posted by curaga View Post

I have a BIG question to bridgman:

How come Via has been able to give us a driver with HD decoding?

i'm sure that they have a different hw architecture for their boards. and you know well that via boards aren't as good as ati's, at least in my opinion.

**curaga** · 05 February 2008, 01:20 PM

Earlier in this thread bridgman said no company has done so.

Via is my favorite company, right after Intel. Their open source drivers are second best to Intel in Linuxland for features. After them is Matrox, but as their cards don't have hw for HD, they can't implement that.

Via's graphics are similar to Intel's; not awesome 3d game performance, only OK, but watching video and normal use, they rock.
I'd prefer a Via card to Ati any day.

Announcement

Why I think the DRM and open source debate is nonsense

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment