Announcement
Collapse
No announcement yet.
NVIDIA Publicly Releases Its OpenCL Linux Drivers
Collapse
X
-
Apologies if this doesn't make sense, but I've not read that deeply into how OpenCL works within the system as a whole: with an nivida beta library, and the amd cpu library, can those two work together? (i.e sending off code to the cpu or gpu as appropriate)
-
Originally posted by nanonyme View PostYeah and he claimed encoding process doesn't use such stuff that would be useful to offload to GPU. It is possible that the situation between encoding and decoding is asymmetric and they don't benefit of the same kind of system resources equally but as I said, to find out whether that's true or not needs going more in-depth than you two seem willing to go.
You don't qualify what you mean by "in-depth" but keep in mind this topic is suitable for PhD-level research. There are not many people out there who have the necessary background and would attempt to implement a GPU-accelerated transcoder for free.
This also happens to be an area of active research. For anyone interested:
1. Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA
2. Intra Frame Encoding Using Programmable Graphics Hardware
3. H.264 Video Encoding Algorithm on Cell Broadband Engine
Leave a comment:
-
Yeah and he claimed encoding process doesn't use such stuff that would be useful to offload to GPU. It is possible that the situation between encoding and decoding is asymmetric and they don't benefit of the same kind of system resources equally but as I said, to find out whether that's true or not needs going more in-depth than you two seem willing to go.
Leave a comment:
-
Originally posted by nanonyme View PostOriginally posted by BlackStarYes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.
Originally posted by BlackStarIn the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.
Caveats:
1. OpenCL drivers have only been public for a few days - it's way too early to make meaningful performance comparisons.
2. I am pretty certain that the OpenCL specs allow you to use multiple OpenCL contexts and share data between them (i.e. create a GPU and a CPU context side by side) but I am waiting for AMD's GPU-accelerated implementation to verify this claim. As far as I know Nvidia's drivers don't expose any CPU devices.
Leave a comment:
-
Originally posted by yogi_berra View PostLet me know when your drivers cure cancer or feed the hungry, then I'll switch.
Now surrender or die!Last edited by V!NCENT; 30 September 2009, 08:19 AM.
Leave a comment:
-
Originally posted by V!NCENT View Post"Oh V!NCENT! Can I please swap my nVidia card for your ATI card because mine isn't officialy supported anymore! Q_Q"
However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions:
Leave a comment:
-
Originally posted by BlackStar View PostYes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.
Leave a comment:
-
Originally posted by smitty3268 View PostYou seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation.
In the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.
A well-implemented CPU+GPU encoder is likely to be faster than a pure CPU implementation with our current hardware.
Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.
However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions: GPGPU is only 2 or 3 old - this is a very new field and the necessary tools (profilers, debuggers) are only starting to appear now. In comparison, we've had decades of experience optimizing algorithms for the CPU and much more mature tools.
A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.
Also, don't underestimate the power of a modern GPU. They can execute math instructions at ridiculous speeds (close-to-single-cycle sin/cos, matrix and vector math) and have tremendous amounts of bandwidth for coherent memory accesses (exceeding 100GB/s on high-end cards). GPUs are also improving at a faster rate than CPUs for the specific workloads they are good at (a 1.5x-2x jump in performance every 18 months!)
Leave a comment:
-
When it comes to benchmarking please include a Mac running Snow Leopard.
Yes a bit more work. It would be very worthwhile though do to OpenCL more or less coming from Apple. They could represent a baseline.
By the way I do understand that SL has issues and thus needs time to mature. That probably applies to the SL software from the card vendors two. Actually I expect the NVidia drivers to be close to the same on both platforms. Well hopefully, what I want to know though is how well Apples GCD works in conjunction with OpenCL. Will an OpenCL program benefit from running on a SL based system? If it does is Apples overhaul of it's threading system helping out?
Keep up the good work.
Dave
Leave a comment:
-
You seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation. You might as well just stick with your current CPU-bound code that you already know is working and efficient, instead of porting to a new platform that probably isn't going to be as efficient on a CPU anyway just because of it's GPU-centric nature and your extensive optimizations to the old codebase.
Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.
A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.
Leave a comment:
Leave a comment: