Announcement

Collapse
No announcement yet.

Looking At The OpenCL Performance Of ATI & NVIDIA On Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Vidia does the same shit: DP is only supported on higher-end cards and on consumer Fermi cards, DP performance has even been artifically reduced.

    AMD doesn't yet properly support double precision anyway; it only works with an AMD-specific OpenCL extension that isn't compatible to the standard extension for DP.

    Comment


    • #17
      AMD's DP isn't even IEEE compatible

      NVidia does this business jigjag shit because of the lack of competition. sigh...

      Comment


      • #18
        not terribly useful

        Although its probably better than nothing, these tech demo's hardly constitute reasonable benchmarks.

        Despite aims to the contrary, with OpenCL you really need to tailor the code to suit each device individually - otherwise you can end up with huge performance differences. Orders of magnitude differences. Even the few benchmarks you have demonstrate this with the huge relative swings on the same hardware.

        It would be useful to analyse the code in question to determine why it might be running so much differently on a given architecture. e.g. on gpu's vectorised code makes no difference, but on intel it makes some difference and on cell it should make a huge difference. On intel you get no memory access coalescing or local memory, nvidia has L1 cache for array accesses, ati has more registers, etc. The wavefront sizes vary between devices. And some devices can run multiple work queues simultaneously.

        Comment


        • #19
          Originally posted by vrodic View Post
          Maybe Intel has such implementation of the OpenCL compiler, or LLVM has some OpenCL frontend/parser.
          Intel is not a openCL supporter.

          Comment


          • #20
            Actually, I think, OpenCL has a lower-level API than Cuda, probably because the large variances in vector unit designs.
            At this point in time, everybody is experimenting with how to interface vector units.
            Due to the ATI 2000-5000 series cards having a 4simple+1complex unit, it means that some code will run much faster than others. Depending on the types of instructions you use, whereas on Nvidia hardware you have a single complex unit that you stream data into. So their hardware runs much the same, independent of what instruction order you actually use, etc...

            The ATi design was optimised for rendering, and the symmetry that you require to get consistently good compute performance was not considered. I suspect that the bits for compute performance was added on later on (considering OpenCL only works on 4000+ cards).
            The new 4d arch is symmetrical, and I assume that its performance will be much more symmetrical, since you only have to reorganize instructions to prevent stalls, almost identically to how a regular CPU works.

            Meaning that I think we will see much more consistent performance from the new ATi arch.

            So, it is not so much that OpenCL is flawed, but more that due to the large varied and specialized vector units out there, that performance varies so much.

            Hmmm, what is Fusion parts also use the 4d symmetrical vector units? That would make even Ontario competitive in OpenCL applications...

            Birdgman?
            I guess I'm asking too much :P

            Comment


            • #21
              Well, you're asking too *soon* anyways. Wait until the product launches at least

              Comment


              • #22
                Your test is not fair, where is at lest 5850?

                Comment


                • #23
                  Originally posted by brent View Post
                  Michael, please keep in mind that SmallPtGPU contains a bug/incompatibility that seriously limits performance on NVidia hardware, especially pre-Fermi.

                  Here's a diff that fixes it. This improves performance more than ten-fold on G80/GT200.
                  I'm probably missing something here, but the patch that you linked only seems to correct things for Mac OS (#ifdef __APPLE__). The tests Michael ran were all in Ubuntu

                  Comment


                  • #24
                    Originally posted by Veerappan View Post
                    I'm probably missing something here, but the patch that you linked only seems to correct things for Mac OS (#ifdef __APPLE__). The tests Michael ran were all in Ubuntu
                    NVidia's implementation defines __APPLE__ on all OSes, for... whatever... reasons. I think the workaround is not needed on OS X anymore, either. Removing it completely should be fine.

                    Comment


                    • #25
                      Originally posted by brent View Post
                      NVidia's implementation defines __APPLE__ on all OSes, for... whatever... reasons. I think the workaround is not needed on OS X anymore, either. Removing it completely should be fine.
                      You can just run the executable compiled with ATI OpenCL SDK on any NVIDIA hardware so you will be sure to the benchmark exactly in the same condition.

                      Please, note, defining __APPLE__ under Linux is a (huge) NVIDIA's bug.

                      BTW, I'm the author of SmallPtGPU, MandelGPU, etc.; I have a 2x5870, 1x5850 and a 5770, let me know if you need to run the benchmarks on any of the above hardware.

                      Michael, you may be interested to check http://www.luxrender.net/wiki/index.php?title=SLG
                      It is a larger/more complex OpenCL application than the small demos (i.e. SmallPtGPU, ecc.) and it may provide more real-world numbers.

                      You can find small demo video about SLG here: http://vimeo.com/14290797

                      Comment


                      • #26
                        Originally posted by deanjo View Post
                        Intel is not a openCL supporter.
                        in that point Apple and Nvidia are the Good ones ;-) and intel is the Evil.

                        Comment


                        • #27
                          Originally posted by Qaridarium View Post
                          in that point Apple and Nvidia are the Good ones ;-) and intel is the Evil.
                          Intel is also the ones that killed Havok FX after having both nVidia and ATi demoing it on their hardware some 2 years earlier (GDC2006) then nVidia purchasing Physx. Chances are if intel didn't purchase and kill Havok FX nVidia would have never purchased Agiea a few years later to provide their own solution.

                          Comment


                          • #28
                            Intel do doubt saw that allowing Havok FX to live would mean giving more substance to the value of a GPU over a CPU, a market they still can't really compete in.

                            Comment


                            • #29
                              Originally posted by deanjo View Post
                              Intel is also the ones that killed Havok FX after having both nVidia and ATi demoing it on their hardware some 2 years earlier (GDC2006) then nVidia purchasing Physx. Chances are if intel didn't purchase and kill Havok FX nVidia would have never purchased Agiea a few years later to provide their own solution.
                              "Intel do doubt saw that allowing Havok FX to live would mean giving more substance to the value of a GPU over a CPU, a market they still can't really compete in."

                              intel is evil i know...
                              i never buy an intel cpu or product in the last 12 years.
                              and in the future i will never buy any product of this company.

                              but in my point of view nvidia fails on physX because an open standart like openCL and bulledphysik are much better also for nvidia thats because if there is more usage for an GPU nvidia will sell more GPUs and intel will lose more and more because no one need an fast CPU anymore.
                              on an open Standard like openCL only intel is the loser.
                              nvidia waste there time on CUDA and PhysX.

                              Comment


                              • #30
                                Originally posted by Qaridarium View Post
                                "Intel do doubt saw that allowing Havok FX to live would mean giving more substance to the value of a GPU over a CPU, a market they still can't really compete in."

                                intel is evil i know...
                                i never buy an intel cpu or product in the last 12 years.
                                and in the future i will never buy any product of this company.

                                but in my point of view nvidia fails on physX because an open standart like openCL and bulledphysik are much better also for nvidia thats because if there is more usage for an GPU nvidia will sell more GPUs and intel will lose more and more because no one need an fast CPU anymore.
                                on an open Standard like openCL only intel is the loser.
                                nvidia waste there time on CUDA and PhysX.
                                What the developers use for a physics engine is up to the developer. If a developer is willing to go through the "growing pains" of getting another physics engine going on GPU then they still have that option. Nobody is blocking them from doing so. Nvidia also contributes to openCL and probably has the best implementation of it out there along with some of the best documentation. They are not forcing anybody to use Cuda or Physx, that is the choice of the developer. If you don't like the developer using Physx then complain to the developer.

                                Comment

                                Working...
                                X