Announcement

**bridgman** · 04 January 2009, 07:32 PM

Originally posted by RealNC View Post

You can run everything on the GPU that way... (OpenCL)

I should mention that some of the video decode work (particularly the variable length decode (aka entropy decode) doesn't have much inherent parallelism and isn't the kind of thing that the GPU core does particularly well. Once you have gotten through the entropy decode, however, the rest of the work (starting with inverse quantization) is more amenable to generic GPU processing. Encoding is even more GPU-friendly, since motion estimation involves some godawful-expensive sweeps through the image looking for pattern matches, and GPUs are really good at that kind of thing.

It's really the bit-pickin' work where dedicated hardware is most useful. Fortunately the entropy decode is a relatively small part of the workload and is at the front of the pipeline so doing it on CPU is not a problem.

**Redeeman** · 04 January 2009, 08:33 PM

Originally posted by rvdboom View Post

I have a 2GHz Opteron and it cannot deal with HD contents, either in MPEG4, MPEG2 or any other codec. And yes, I use overlay.
There is another reason why having accelerated MPEG2 would be nice : professional HD codecs. Most of them (HDV, HDCam, XD-Cam) are still MPEG-2 based and it's quite nice to be able to work natively with them at full resolution. Blue Ray is still MPEG2 also, AFAIR.
But I think that what should be worked on is a more generic acceleration API than XvMC allowing for several codecs, at least H264.
Since GPU are programmable now, isn't it possible to provide a generic API allowing to create small decoders in the appropriate programming language?

That simply cannot be possible.

I have a first generation amd64, socket 754, 2ghz, with very slow singlechannel ddr1 ram, and i can without issues play 1920x1080 mpeg2 videos, with standard xv output (overlay). Your opteron can impossibly be slower?

**MU_Engineer** · 04 January 2009, 09:14 PM

Originally posted by Redeeman View Post

That simply cannot be possible.

I have a first generation amd64, socket 754, 2ghz, with very slow singlechannel ddr1 ram, and i can without issues play 1920x1080 mpeg2 videos, with standard xv output (overlay). Your opteron can impossibly be slower?

The bitrate of the MPEG-2 file is a big consideration. If he is playing large 1080p 40 Mbps files and yours are 1080i 15 Mbps or so, his Opteron has a MUCH harder workload ahead of it than your 3000+ does.

**deneb** · 05 January 2009, 06:06 AM

Originally posted by RealNC View Post

That's why we want OpenCL supported soon

You can run everything on the GPU that way, regardless of how it's been encoded.

... when you have implemented fully-featured decoders that are comparable to current CPU-based decoders. General purpose GPU programming is harder than traditional coding, so don't expect quick results once OpenCL is usable.

CoreAVC (a Windows DirectShow H.264/AVC decoder) will be supporting CUDA soon in order to allow decoding of every video on the GPU with no regards to how it's been encoded.

CoreAVC is still only using the dedicated VP2/3 decoder through NVCUVID API in CUDA, just like Donald Graft's DGAVCDecNV. Most videos will be supported since NVIDIA's H.264 decoder is quite flexible (it can decode 1080p with 15 reference frames), but for example video resolutions higher than 1920x1080 will probably not work. Well, current CoreAVC doesn't support them either, but most other software decoders do.

**rvdboom** · 05 January 2009, 10:06 AM

Originally posted by MU_Engineer View Post

The bitrate of the MPEG-2 file is a big consideration. If he is playing large 1080p 40 Mbps files and yours are 1080i 15 Mbps or so, his Opteron has a MUCH harder workload ahead of it than your 3000+ does.

Exactly.
The HD files I deal with indeed are made in a HD movie production pipeline, so they're usually quite high bitrate, above 30Mbs most of the time. Most of them are progressive, 24fps files, since they are made to be transfered on 35mm film prints.
I can play them on Windows with appropriate drives and players, but not on Linux.
Of course, I'm kind of a corner case, but I still would love GPU encode/decode support on Linux, as the hardware is already there to do that.

**rvdboom** · 05 January 2009, 10:10 AM

Originally posted by deneb View Post

... when you have implemented fully-featured decoders that are comparable to current CPU-based decoders. General purpose GPU programming is harder than traditional coding, so don't expect quick results once OpenCL is usable.

Stupid question :
Would it be possible to have sort of a translator from C to OpenCL, allowing to for instance to translate ffmpeg decoders with minimum of work?

**bridgman** · 05 January 2009, 11:06 AM

It would be really hard. The way you structure a program for running efficiently on parallel hardware like a GPU is quite different from the way you would program a CPU.

In the case of a CPU you normally use loops and can easily have different behaviour at different points through the loop; with a GPU you set up little programs which run independently on every point in an image and then let them all run in parallel. As a result, complicated things like dealing with the edge of a block tend to be handled very differently in a GPU than in a CPU.

In-between systems like Cell processors bring yet another set of challenges; on those you can write more CPU-like code but you still need to deal with partitioning the work across a lot of little processors.

So, bottom line is that there is no easy solution yet, at least not that anybody has found.

**lamikr** · 05 January 2009, 01:21 PM

Sorry a stupid question but I have not been involved earlier anyway with gpu programming.

If you have multiple independent small programs that makes the calculations in paraller, is there often a problem that program2 can for example make it's work for pixel1 only once the program1 have finished it's job for certain pixel area? I mean how do you handle the dependencies between those small applications so that you prevent running them in wrong order. (3 + 2) * 5 != 3 + 2 * 5

**bridgman** · 05 January 2009, 01:50 PM

Nope, it's a good question. There are ways you can deal with dependencies and data communication between parallel threads, but in general if you rely on them much you lose most of the benefit of parallism.

This is why converting from a CPU program to a GPU program is not easy; you often need to come up with a new approach to solving the problem which does not require those dependencies. Some problems (eg picking apart a sequential bitstream) don't lend themselves to parallel processing at all.

**RealNC** · 05 January 2009, 02:10 PM

Originally posted by deneb View Post

CoreAVC is still only using the dedicated VP2/3 decoder through NVCUVID API in CUDA, just like Donald Graft's DGAVCDecNV. Most videos will be supported since NVIDIA's H.264 decoder is quite flexible

Hmm. So then OpenCL is not the answer here? Then what is?

Announcement

XvMC support

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment