NVIDIA Publicly Releases Its OpenCL Linux Drivers

mirv replied

30 September 2009, 11:13 AM
Apologies if this doesn't make sense, but I've not read that deeply into how OpenCL works within the system as a whole: with an nivida beta library, and the amd cpu library, can those two work together? (i.e sending off code to the cpu or gpu as appropriate)
Leave a comment:
BlackStar replied

30 September 2009, 10:45 AM
Originally posted by nanonyme View Post

Yeah and he claimed encoding process doesn't use such stuff that would be useful to offload to GPU. It is possible that the situation between encoding and decoding is asymmetric and they don't benefit of the same kind of system resources equally but as I said, to find out whether that's true or not needs going more in-depth than you two seem willing to go.

Sigh...

You don't qualify what you mean by "in-depth" but keep in mind this topic is suitable for PhD-level research. There are not many people out there who have the necessary background and would attempt to implement a GPU-accelerated transcoder for free.

This also happens to be an area of active research. For anyone interested:
1. Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA
2. Intra Frame Encoding Using Programmable Graphics Hardware
3. H.264 Video Encoding Algorithm on Cell Broadband Engine
Leave a comment:
nanonyme replied

30 September 2009, 09:59 AM
Yeah and he claimed encoding process doesn't use such stuff that would be useful to offload to GPU. It is possible that the situation between encoding and decoding is asymmetric and they don't benefit of the same kind of system resources equally but as I said, to find out whether that's true or not needs going more in-depth than you two seem willing to go.
Leave a comment:
BlackStar replied

30 September 2009, 09:32 AM
Originally posted by nanonyme View Post

Originally posted by BlackStar

Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.

Actually that's missing his point again, I think. His point seems to be that CPU's are far superior in some tasks while GPU's are far superior in others and encoding just doesn't happen to be one of the tasks that uses that much of the functionality GPU's are superior in. (whether it actually is or not is another issue and can probably be analyzed further)

Please read my whole post before jumping to conclusions. In the very next sentence I qualified my position:

Originally posted by BlackStar

In the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.

Once again, I am not saying we should move *everything* to the GPU - that's naive. I am saying that specific parts of the encoding process are well-suited to the GPU. I am also saying that OpenCL allows you to split workloads between the CPU *and* the GPU - thus giving you the ability to make the most of both.

Caveats:
1. OpenCL drivers have only been public for a few days - it's way too early to make meaningful performance comparisons.
2. I am pretty certain that the OpenCL specs allow you to use multiple OpenCL contexts and share data between them (i.e. create a GPU and a CPU context side by side) but I am waiting for AMD's GPU-accelerated implementation to verify this claim. As far as I know Nvidia's drivers don't expose any CPU devices.
Leave a comment:
V!NCENT replied

30 September 2009, 08:17 AM
Originally posted by yogi_berra View Post

Let me know when your drivers cure cancer or feed the hungry, then I'll switch.

It is doing Folding@Home So uhm, yeah; it is doing it's best to cure cancer xD Oh and by buying the card it's also functioning as an income so people at AMD can buy food. DOUBLE EPIC PWNAGE FTW!!!!

Now surrender or die!

Last edited by V!NCENT; 30 September 2009, 08:19 AM.
Leave a comment:
yogi_berra replied

30 September 2009, 07:53 AM
Originally posted by V!NCENT View Post

"Oh V!NCENT! Can I please swap my nVidia card for your ATI card because mine isn't officialy supported anymore! Q_Q"

Let me know when your drivers cure cancer or feed the hungry, then I'll switch.

However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions:

Its only too early for the open source implementations, photoshop CS4 performance improved remarkably with CUDA and that isn't doing anything computationally heavy. Gelato, nvidia's prman-ish offering, shows a clear difference being GPU-accelerated.
Leave a comment:
nanonyme replied

30 September 2009, 06:23 AM
Originally posted by BlackStar View Post

Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.

Actually that's missing his point again, I think. His point seems to be that CPU's are far superior in some tasks while GPU's are far superior in others and encoding just doesn't happen to be one of the tasks that uses that much of the functionality GPU's are superior in. (whether it actually is or not is another issue and can probably be analyzed further)
Leave a comment:
BlackStar replied

30 September 2009, 04:58 AM
Originally posted by smitty3268 View Post

You seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation.

Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.

In the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.

A well-implemented CPU+GPU encoder is likely to be faster than a pure CPU implementation with our current hardware.

Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.

Yes, this implementation is underwhelming. The quality isn't the problem, actually (they only implemented the baseline x264 profile) but the speed isn't all that good.

However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions: GPGPU is only 2 or 3 old - this is a very new field and the necessary tools (profilers, debuggers) are only starting to appear now. In comparison, we've had decades of experience optimizing algorithms for the CPU and much more mature tools.

A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.

Who said anything about running single-threaded workloads on the GPU? OpenCL is designed for parallel workloads. Video compression involves a number of steps with high parallelism potential.

Also, don't underestimate the power of a modern GPU. They can execute math instructions at ridiculous speeds (close-to-single-cycle sin/cos, matrix and vector math) and have tremendous amounts of bandwidth for coherent memory accesses (exceeding 100GB/s on high-end cards). GPUs are also improving at a faster rate than CPUs for the specific workloads they are good at (a 1.5x-2x jump in performance every 18 months!)
Leave a comment:
wizard69 replied

30 September 2009, 01:26 AM
When it comes to benchmarking please include a Mac running Snow Leopard.

Yes a bit more work. It would be very worthwhile though do to OpenCL more or less coming from Apple. They could represent a baseline.

By the way I do understand that SL has issues and thus needs time to mature. That probably applies to the SL software from the card vendors two. Actually I expect the NVidia drivers to be close to the same on both platforms. Well hopefully, what I want to know though is how well Apples GCD works in conjunction with OpenCL. Will an OpenCL program benefit from running on a SL based system? If it does is Apples overhaul of it's threading system helping out?

Keep up the good work.

Dave
Leave a comment:
smitty3268 replied

29 September 2009, 07:52 PM
You seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation. You might as well just stick with your current CPU-bound code that you already know is working and efficient, instead of porting to a new platform that probably isn't going to be as efficient on a CPU anyway just because of it's GPU-centric nature and your extensive optimizations to the old codebase.

Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.

A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.
Leave a comment:

Announcement

NVIDIA Publicly Releases Its OpenCL Linux Drivers

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: