Announcement

Collapse
No announcement yet.

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

    Phoronix: AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

    While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers. The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance. Over the past two years AMD has quietly been funding an effort though to bring binary compatibility so that many NVIDIA CUDA applications could run atop the AMD ROCm stack at the library level -- a drop-in replacement without the need to adapt source code. In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention. Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    This
    But the real kicker here is that using ZLUDA + CUDA back-end was slightly faster than the native Radeon HIP backend.
    and this
    Similarly, Andrzej Janik has found that the ZLUDA code path for CUDA-enabled software like GeekBench often performs better than the generic OpenCL path for allowing existing Radeon GPU compute support. In best cases the ZLUDA path was 128~175% the performance of the OpenCL Geekbench results for a Radeon RX 6800 XT.
    is why CUDA is the de facto GPU computing standard. NVIDIA cares.

    ZLUDA is an incredible technical feat getting unmodified CUDA-targeted binaries working on AMD GPUs atop the ROCm compute stack.
    Yes, this is a massive and very welcome undertaking and mad kudos to its author. Hopefully someone will continue to fund him.

    Comment


    • #3
      Fantastic ​​​​

      Comment


      • #4
        Not bad, why would AMD stop that project, especially considering the results?

        The reason why ZLUDA is faster than HIP is most likely more optimizations in the CUDA path.

        Comment


        • #5
          Originally posted by Anux View Post
          Not bad, why would AMD stop that project, especially considering the results?

          The reason why ZLUDA is faster than HIP is most likely more optimizations in the CUDA path.
          My only guess has been that AMD concerned over legal/trademark issues with a "CUDA" implementation.... or that having it become an 'independent' open-source project is their good way out to avoid that and who knows if they decide to re-fund it or use it to gauge customer interest, etc.
          Michael Larabel
          https://www.michaellarabel.com/

          Comment


          • #6
            Awsome job!
            Now take it and stick it into Mesa alongside rusticl Cuda on freedreno would be fun.
            (I know it can't be so easy...)

            Comment


            • #7
              I wonder if AMD abandoned it because Nvidia consistently performed better. The biggest users of CUDA are for workstations and servers - AMD has very little chance to convince such markets to swap to non-CUDA APIs, so the only way they can take such customers is if they offer better performance-per-dollar, performance-per-watt, and/or better overall performance. Nvidia so far has a more efficient architecture and as the benchmarks show, they have better overall performance too. Perhaps (but maybe not actually) AMD has better performance-per-dollar, but even if it is, maybe it's not good enough to be a compelling switch for organizations willing to spend 5 figures on a GPU.

              Of course, this project is early enough that maybe there just wasn't enough time to optimize, but the probability of AMD beating Nvidia at their own self-tailored design is slim. That being said, I think this is very good performance for an incomplete product, and there's definitely room for more potential.

              Comment


              • #8
                Originally posted by avis View Post
                and this
                ...
                is why CUDA is the de facto GPU computing standard. NVIDIA cares.
                I'd think it's the opposite. CUDA was already the market leader when OpenCL was created. The CUDA code path is faster because the devs probably spent more time on it, given it's the market they care about.

                Comment


                • #9
                  Originally posted by Serafean View Post
                  Awsome job!
                  Now take it and stick it into Mesa alongside rusticl Cuda on freedreno would be fun.
                  (I know it can't be so easy...)
                  It's all catered to AMD / ROCm APIs... Not suitable for Mesa short of a complete rewrite really.
                  Michael Larabel
                  https://www.michaellarabel.com/

                  Comment


                  • #10
                    Originally posted by avis View Post
                    is why CUDA is the de facto GPU computing standard. NVIDIA cares.
                    I don't think it's so much that AMD doesn't care. AMD's current high end hardware is notably being used in new datacenter & HPC deployments in lieu of Nvidia's in some cases. Notably the current top supercomputer, Frontier at Oak Ridge NL, uses AMD EPYC & Instinct. Yes I know 7 of the other top 10 supercomputers use Nvidia, with one using Intel DPUs. AMD cares because that's still some mega bucks and feathers in their cap. I think where AMD is falling down is they haven't hired the numbers & software talent incumbent Nvidia has traditionally enjoyed that created a very well written and engineered software CUDA stack. They're continuously playing catch up while Nvidia is driving the compute market. Nvidia is dominating HPC because CUDA is best in class with its hardware generally being performant enough to justify the costs - but mostly because of CUDA. It performs equally well regardless of whether you're using an Intel, AMD, or IBM (POWER ISA) CPU. ROCm & Intel's OneAPI currently can't do that.

                    It's also possible that there's some patent and copyright minefields here that favor incumbent Nvidia. Speculatively, it would make sense as both Intel and AMD quietly dropped support for ZLUDA under mysterious circumstances. It makes me wonder if lawyers at either company, or someone in the Nvidia sphere quietly had a word with Intel and AMD management mumbling something about Oracle v. Google and API copyrights + software patents + the nuclear option. AMD certainly can't afford a massively expensive legal attack that even if they win may ruin them financially. It's much smaller in revenue than both Intel and Nvidia. Again, that's all pure speculation, take it with a grain of salt since neither AMD nor Intel said why they stopped funding ZLUDA development.

                    Either way, it makes one hopeful for ZLUDA as a proper drop-in replacement for CUDA programs now that it's open source. In time and with help from others the rest of the compatibility picture can be completed. I just hope whatever stopped Intel and AMD doesn't come whisper threats in Andrzej Janik's ear, too.

                    Comment

                    Working...
                    X