Announcement

Collapse
No announcement yet.

Looking At The OpenCL Performance Of ATI & NVIDIA On Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking At The OpenCL Performance Of ATI & NVIDIA On Linux

    Phoronix: Looking At The OpenCL Performance Of ATI & NVIDIA On Linux

    Recently we provided the first Linux-based review of the NVIDIA GeForce GTX 460 graphics card. Overall, this Fermi-based graphics card was a great performer for selling around $200 USD and is complemented by great video playback capabilities with VDPAU acceleration and great proprietary driver support. In that review we primarily looked at the OpenGL performance under Linux, but with NVIDIA's Fermi architecture bringing great GPGPU advancements for CUDA and OpenCL users too, in this article we are looking more closely at the Open Computing Language performance of this GF104 graphics card as well as other NVIDIA and ATI graphics cards.

    http://www.phoronix.com/vr.php?view=15257

  • #2
    thanks for those tests !

    why weren't the Cypress GPUs not included (5850, 5870) and only the smaller Junipers ?

    Comment


    • #3
      And what about SmallLuxGpu ?

      Comment


      • #4
        Originally posted by kernelOfTruth View Post
        why weren't the Cypress GPUs not included (5850, 5870) and only the smaller Junipers ?
        No access to such hardware...
        Michael Larabel
        http://www.michaellarabel.com/

        Comment


        • #5
          What about the FirePro's? Could be fun to see if they are that much faster in OpenCL than the consumer cards.

          Comment


          • #6
            Originally posted by Michael View Post
            No access to such hardware...
            ah - thanks for the clarification !

            it's just that the results look rather in favor of Nvidia

            do you have a 5830 available (afaik that's a cypress LE) ?

            Comment


            • #7
              Originally posted by kernelOfTruth View Post
              ah - thanks for the clarification !

              it's just that the results look rather in favor of Nvidia

              do you have a 5830 available (afaik that's a cypress LE) ?
              Nope... Any and all GPUs you would have seen in a review. Only Evergreen ASICs I have are the HD 5550 and HD 5570 and an HD 5450 - that I actually bought and am in the process of reviewing.
              Michael Larabel
              http://www.michaellarabel.com/

              Comment


              • #8
                do any of those benchmarks use double precision floating point?

                (and as always: it would be nice if you could put error bars on the plots)

                Comment


                • #9
                  Great benchmark!

                  Thank you Michael!

                  It would be nice to compare it with some CPU. So we can actually see if low end cards make sense for that processing work.

                  I'm not sure which OpenCL CPU implementation is efficient and uses SSE2 etc. Maybe Intel has such implementation of the OpenCL compiler, or LLVM has some OpenCL frontend/parser.

                  Comment


                  • #10
                    Michael, please keep in mind that SmallPtGPU contains a bug/incompatibility that seriously limits performance on NVidia hardware, especially pre-Fermi.

                    Here's a diff that fixes it. This improves performance more than ten-fold on G80/GT200.

                    Comment


                    • #11
                      MandelGPU might suffer from something similar (redefining __constant to __global), but I haven't checked.

                      Comment


                      • #12
                        Hmm. On my HD5970 for SmallPT 1.6 GPU Caustic3, I'm getting 45200 KSamples/sec on the GPU, and ~16000 KSamples/sec on my Core i7 920. Neither part is overclocked; they're at their factory default clock rates.

                        The GPU number is lower than either of Michael's radeons, but still a ways faster than Michael's GT 240. The numbers seem unaffected by whether compiz is on. I find it hard to accept that a HD5970 gets poorer results than a 5770. Even if a HD5970 is two 5850 cores together, shouldn't even one of those cores single-handedly outperform a 5770? And wouldn't OpenCL have the smarts to use both cores automatically to make it nearly twice as fast?

                        I noticed something funky about the tests, though. When the test is running, the output visual says at the bottom something like 52000K samples/sec. This is substantially larger than the 45000 Ksamples/sec reported by PTS in the output. I'm not sure why such the large discrepancy. Bug in PTS? Bug in the test?

                        Either way, it seems (disappointingly) that a HD5970 is only 3 times faster at this test than a Core i7? It is probably more economical to use a bunch of CPUs than to use GPUs for this kind of workload, seeing how a Core i7 is much cheaper than a dual gpu HD5970. We already know from other tests that a GPU is many, many, many times faster than the CPU at OpenGL 3d rendering, so maybe the parts needed for general purpose GPGPU are kept to a modest level on Evergreen in order to support top-of-the-line 3d graphics. I'm not complaining, since I don't use GPGPU for anything other than PTS

                        Comment


                        • #13
                          Hmm, also interesting: I got 31011133.23 average for the mandelGPU test. Although the test was not running at 1920x1080, but rather the default resolution, so that may account for the difference.

                          If the resolution isn't important though, as it sometimes isn't, then I'm getting about 1.5x the performance of the GTX 460 with the HD5970. This is more in line with what I was expecting.

                          Comment


                          • #14
                            Originally posted by allquixotic View Post
                            Hmm. On my HD5970 for SmallPT 1.6 GPU Caustic3, I'm getting 45200 KSamples/sec on the GPU, and ~16000 KSamples/sec on my Core i7 920. Neither part is overclocked; they're at their factory default clock rates.
                            16000 KSamples/sec sounds too good to be true for a CPU, seriously.

                            And wouldn't OpenCL have the smarts to use both cores automatically to make it nearly twice as fast?
                            No. The GPUs are separate OpenCL devices an the program needs to explicitly use those.

                            Comment


                            • #15
                              Originally posted by ssam View Post
                              do any of those benchmarks use double precision floating point?

                              (and as always: it would be nice if you could put error bars on the plots)
                              You need at least a HD5830 to test double precision floating point on the AMD side anyways..the 5700 and lower only support single point.
                              Those who would give up Essential Liberty to purchase a little Temporary Safety,deserve neither Liberty nor Safety.
                              Ben Franklin 1755

                              Comment

                              Working...
                              X