Announcement

Collapse
No announcement yet.

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    That's definitely some sidestepping of legal issues... AMD paid for the guy to work for them he happened to write/improve ZLUDA and it happened to get released as open source on contingency of his contract with AMD ending.... that's definitely intentional but it also means that ZLUDA can't be directly tied back to AMD as a product. On the other hand AMD didn't accidentally hire this guy to work on just this fora whole year... sounds like it works quite well too.

    Sounds like even though AMD didn't release it officially hey really did the guy a solid.
    Last edited by cb88; 12 February 2024, 11:27 AM.

    Comment


    • #12
      Originally posted by Anux View Post
      Not bad, why would AMD stop that project, especially considering the results?

      The reason why ZLUDA is faster than HIP is most likely more optimizations in the CUDA path.
      Consider the long-term strategic implications. Translated CUDA is faster today because it benefits from Nvidia's compiler and engineering assistance, but it competes for developer effort with hypothetical perfected direct-ROCM implementation of the same codes. And Nvidia's CUDA will always have a head start on any new features and on hardware-API fit. If the industry settles on CUDA with other vendors supported through translation, AMD will have a permanent disadvantage at the same level architectural sophistication on the same process nodes.

      It's basically the same as the Wine-vs-native debate for gaming. AMD presumably thinks they have a better chance than native Linux gaming. Also, ZLUDA doesn't conveniently sidestep the terrible ROCM packaging / hardware support story like Wine does with Linux backwards compatibility above the kernel level.

      Comment


      • #13
        This is not the way, translated code for a vastly different architecture is doomed to perform subpar.

        Comment


        • #14
          Very interesting, indeed. How can a library wrapper be faster than a native implementation? Is the rocm implementation poorly optimized, somehow? Or, did I understand this altogether?

          Comment


          • #15
            Originally posted by and.elf View Post
            Very interesting, indeed. How can a library wrapper be faster than a native implementation? Is the rocm implementation poorly optimized, somehow? Or, did I understand this altogether?
            More likely that apps themselves are better optimised for CUDA.

            Comment


            • #16
              Originally posted by Michael View Post

              My only guess has been that AMD concerned over legal/trademark issues with a "CUDA" implementation.... or that having it become an 'independent' open-source project is their good way out to avoid that and who knows if they decide to re-fund it or use it to gauge customer interest, etc.
              "The developer that coded this, originally coded the software to run on Intel GPU's, while he was an employee of Intel. So unless Nvidia fell asleep at the wheel, Nvidia didn't see any legal issues with the software. AMD certainly knew this" - source.

              Comment


              • #17
                who are the good guys now from linux point of view? amd? intel? nvidia? (I'm really curious because I'm using thinkpads for ages now and I don't play games)

                Comment


                • #18
                  The article seems to mention that it was tested with llama.cpp however i cant find any information about it on the github page.

                  Edit: I tested it on blender 4.0 using my RX 6800 and it seems to be running really well. Props to the developer since the rocm backend always crashed my pc and i had to use cpu rendering for the past year.

                  Comment


                  • #19
                    Originally posted by harakiru View Post
                    The article seems to mention that it was tested with llama.cpp however i cant find any information about it on the github page.

                    Edit: I tested it on blender 4.0 using my RX 6800 and it seems to be running really well. Props to the developer since the rocm backend always crashed my pc and i had to use cpu rendering for the past year.
                    Here's a longer list that was told to me by Janik when I started the testing:

                    • Geekbench
                    • Blender 3.6 (non-OptiX )
                    • 3DF Zephyr
                    • RealityCapture
                    • llama.cpp
                    • Arnold renderer (OptiX, very limited)
                    • LuxCoreRender (non-OptiX)
                    • V-Ray Benchmark (non-OptiX, tricky setup, let me know if you run it)
                    • PyTorch (needs specific compilation flags, very slow cold boot and very limited testing)
                    • BabelStream
                    • SPECFEM-Globe and SPECFEM-Cartesian
                    • QUDA
                    • Chroma
                    • MILC
                    • Kokkos
                    • LAMMPS
                    • OpenFoam (requires PETSc built for an older CUDA/GPU)
                    • XGBoost (works on CUDA 11.8, does not work yet on CUDA 12.3)
                    • NAMD 2.14
                    ‚Äč
                    On the GitHub page maybe he is just mentioning the ones where they work really well or has very recently tested or whatever.
                    Michael Larabel
                    https://www.michaellarabel.com/

                    Comment


                    • #20
                      Originally posted by Serafean View Post
                      Awsome job!
                      Now take it and stick it into Mesa alongside rusticl Cuda on freedreno would be fun.
                      (I know it can't be so easy...)
                      Interestingly, ZLUDA is also written in Rust, just like rusticl!

                      Comment

                      Working...
                      X