Announcement

Collapse
No announcement yet.

NVIDIA Publicly Releases Its OpenCL Linux Drivers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by BlackStar View Post
    Now, OpenCL-based encoding and we are talking.
    Incorrect If you know how video encoding/decoding works, and how GPUs work, GPUs are _not_ CPUs. There are some things they do _very_ well, like the Fast Fourier Transform for example, that make them far better at decoding video. However, the whole GPU encoding mess is smoke and hand-waving. No current GPU can come close to a good CPU running x264. The x264 devs have said that there are some small parts of the encoding process that could benefit from the GPU, but it isn't much. Unfortunately, most (all?) of the magazine and e-mag benchmarks have used a very bad CPU encoder like Apple's, an ancient build of x264, or very crazy settings.

    GPU decoding is what is what will really be nice -- not encoding, sadly. Unless you're talking about advanced post-processing before encoding, as the GPU can help a ton with FFT noise/grain removal.

    Comment


    • #22
      Originally posted by Ranguvar View Post
      Incorrect If you know how video encoding/decoding works, and how GPUs work, GPUs are _not_ CPUs. There are some things they do _very_ well, like the Fast Fourier Transform for example, that make them far better at decoding video. However, the whole GPU encoding mess is smoke and hand-waving.
      OpenCL is GPU/CPU agnostic. You can instruct a specific kernel to run on either, provided your drivers support this. Check the CL_DEVICE_TYPE_* enumeration.

      Thanks for the free lecture, though.

      No current GPU can come close to a good CPU running x264. The x264 devs have said that there are some small parts of the encoding process that could benefit from the GPU, but it isn't much. Unfortunately, most (all?) of the magazine and e-mag benchmarks have used a very bad CPU encoder like Apple's, an ancient build of x264, or very crazy settings.

      GPU decoding is what is what will really be nice -- not encoding, sadly. Unless you're talking about advanced post-processing before encoding, as the GPU can help a ton with FFT noise/grain removal.
      I distinctly remember a couple of CUDA-based x264 encoders being hyped around last spring / summer, with numbers like "40% faster than the faster quad core" being thrown around. I haven't really followed the news, but a quick google search reveals that the actual products don't live up to the hype.

      I still see potential in GPU-assisted encoding, however. Low-end machines will probably benefit more than high-end ones here (the latter can already handle x264 encoding in real time).

      Other than that, you need to remember that we are just at the beginning of the GPGPU era. This is more or less new ground: the necessary tools are only now becoming available (no, Brook and CUDA aren't not nearly good enough for wide-spread adaptation; Compute shaders and OpenCL on the other hand, are.)
      Last edited by BlackStar; 29 September 2009, 06:52 PM.

      Comment


      • #23
        You seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation. You might as well just stick with your current CPU-bound code that you already know is working and efficient, instead of porting to a new platform that probably isn't going to be as efficient on a CPU anyway just because of it's GPU-centric nature and your extensive optimizations to the old codebase.

        Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.

        A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.

        Comment


        • #24
          When it comes to benchmarking please include a Mac running Snow Leopard.

          Yes a bit more work. It would be very worthwhile though do to OpenCL more or less coming from Apple. They could represent a baseline.

          By the way I do understand that SL has issues and thus needs time to mature. That probably applies to the SL software from the card vendors two. Actually I expect the NVidia drivers to be close to the same on both platforms. Well hopefully, what I want to know though is how well Apples GCD works in conjunction with OpenCL. Will an OpenCL program benefit from running on a SL based system? If it does is Apples overhaul of it's threading system helping out?


          Keep up the good work.



          Dave

          Comment


          • #25
            Originally posted by smitty3268 View Post
            You seem to have completely missed Ranguvar's point. It doesn't really matter if you can run the OpenCL code on the CPU, if the CPU is always going to be 10x faster than any GPU implementation.
            Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.

            In the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.

            A well-implemented CPU+GPU encoder is likely to be faster than a pure CPU implementation with our current hardware.

            Now I don't know if that 10x faster condition is really accurate or not... There was the Badaboom h.264 CUDA encoder that seemed to run quite fast, but the problem with it was the quality was horrible. Like on the level of Theora, so it's not really fair to compare it's speed against something like x264. The question then is if the poor quality was necessary to run fast on a GPU or if they've just got a bad implementation. I suspect both might be true.
            Yes, this implementation is underwhelming. The quality isn't the problem, actually (they only implemented the baseline x264 profile) but the speed isn't all that good.

            However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions: GPGPU is only 2 or 3 old - this is a very new field and the necessary tools (profilers, debuggers) are only starting to appear now. In comparison, we've had decades of experience optimizing algorithms for the CPU and much more mature tools.

            A good GPU isn't necessarily faster than a CPU in anything that is single-threaded. GPU's have hundreds of shader processors all running in parallel, but if it can only run one or two at a time it's going to get smoked by even an old CPU.
            Who said anything about running single-threaded workloads on the GPU? OpenCL is designed for parallel workloads. Video compression involves a number of steps with high parallelism potential.

            Also, don't underestimate the power of a modern GPU. They can execute math instructions at ridiculous speeds (close-to-single-cycle sin/cos, matrix and vector math) and have tremendous amounts of bandwidth for coherent memory accesses (exceeding 100GB/s on high-end cards). GPUs are also improving at a faster rate than CPUs for the specific workloads they are good at (a 1.5x-2x jump in performance every 18 months!)

            Comment


            • #26
              Originally posted by BlackStar View Post
              Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.
              Actually that's missing his point again, I think. His point seems to be that CPU's are far superior in some tasks while GPU's are far superior in others and encoding just doesn't happen to be one of the tasks that uses that much of the functionality GPU's are superior in. (whether it actually is or not is another issue and can probably be analyzed further)

              Comment


              • #27
                Originally posted by V!NCENT View Post
                "Oh V!NCENT! Can I please swap my nVidia card for your ATI card because mine isn't officialy supported anymore! Q_Q"
                Let me know when your drivers cure cancer or feed the hungry, then I'll switch.

                However, you cannot judge all potential GPU-accelerated encoders using a single bad x264 implementation. It is too early to draw conclusions:
                Its only too early for the open source implementations, photoshop CS4 performance improved remarkably with CUDA and that isn't doing anything computationally heavy. Gelato, nvidia's prman-ish offering, shows a clear difference being GPU-accelerated.

                Comment


                • #28
                  Originally posted by yogi_berra View Post
                  Let me know when your drivers cure cancer or feed the hungry, then I'll switch.
                  It is doing Folding@Home So uhm, yeah; it is doing it's best to cure cancer xD Oh and by buying the card it's also functioning as an income so people at AMD can buy food. DOUBLE EPIC PWNAGE FTW!!!!

                  Now surrender or die!
                  Last edited by V!NCENT; 30 September 2009, 08:19 AM.

                  Comment


                  • #29
                    Originally posted by nanonyme View Post
                    Originally posted by BlackStar
                    Yes, in your hypothetical world where CPU implementations are 10x faster than GPUs it wouldn't make sense. We wouldn't even have GPUs - let alone OpenCL.
                    Actually that's missing his point again, I think. His point seems to be that CPU's are far superior in some tasks while GPU's are far superior in others and encoding just doesn't happen to be one of the tasks that uses that much of the functionality GPU's are superior in. (whether it actually is or not is another issue and can probably be analyzed further)
                    Please read my whole post before jumping to conclusions. In the very next sentence I qualified my position:

                    Originally posted by BlackStar
                    In the real world however, GPUs have something like 100x the power of CPUs on specific workloads. Many parts of the encoding process happen to involve such workloads and it makes sense to move *those* parts to the GPU.
                    Once again, I am not saying we should move *everything* to the GPU - that's naive. I am saying that specific parts of the encoding process are well-suited to the GPU. I am also saying that OpenCL allows you to split workloads between the CPU *and* the GPU - thus giving you the ability to make the most of both.

                    Caveats:
                    1. OpenCL drivers have only been public for a few days - it's way too early to make meaningful performance comparisons.
                    2. I am pretty certain that the OpenCL specs allow you to use multiple OpenCL contexts and share data between them (i.e. create a GPU and a CPU context side by side) but I am waiting for AMD's GPU-accelerated implementation to verify this claim. As far as I know Nvidia's drivers don't expose any CPU devices.

                    Comment


                    • #30
                      Yeah and he claimed encoding process doesn't use such stuff that would be useful to offload to GPU. It is possible that the situation between encoding and decoding is asymmetric and they don't benefit of the same kind of system resources equally but as I said, to find out whether that's true or not needs going more in-depth than you two seem willing to go.

                      Comment

                      Working...
                      X