Announcement

Collapse
No announcement yet.

An End-User Has Made It Easier To Build ROCm & AMD GPU Machine Learning Software

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Spacefish View Post
    They just enable the "officially" supported targets in the public binary builds.
    It would be easy for AMD to enable all architectures / the projects like PyTorch ONNX and so on to enable them too.

    Speculation: Marketing does not want to enable non officially supported products, as they fear the support requests.
    The Engineers probably would happily enable everything in the official builds and just state that it´s untested / not officially supported.
    At least John Bridgman​ from AMD used to participate these phoronix discussions and gave often some good background information.​ I think it's
    kind of understandable that not all GPU's are not supported officially as companies kind of needs to be careful that what they ship will also work. Especially the older GPU's which had less memory could then create maintenance problems. But for end users these consumer GPU's can be very useful as their production volumes quarantee good prices. And that means that they can be very usefull especially for getting started and for learning purposes in ML and AI. So for this kind of work, I think the opensource community projects are good way to step in.

    Granted that it would be nice to get also some help from AMD for example on that how to generate more Tensile logic tuning json files for these Gridbased problems for which the tuning data is available on hipBLASLt only for newer gpu's. There is some documentation related to logic file generation on Tensile/tuning_docs folder but I have not had time to really try out anything else than their basic example tuning data generation.

    Btw, I just added couple of new patches to integrate the opencl support and one test app to verify the opencl basic functionality on the rocm sdk builder.

    Comment


    • #22
      Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
      https://news.ycombinator.com/item?id=37663194

      That guy works for AMD. The fact that they aren't directly funding stuff like this is ABSURD. The money required wouldn't even be a rounding error on their P&L. Fix it Lisa Su!
      That guy is me. I can't add anything to the linked thread, but I should note that in response to this request for donations, AMD provided an MI210 server to the Debian AI team to enable testing on CDNA 2 hardware. The Oregon Advanced Computing Institute for Science and Society (OACISS) is hosting the server (which has been named Pinwheel). It was set up just before the Debian import freeze for Ubuntu 24.04, which provided enough time for it to be used in adding ROCm support to a couple of packages found in that release (such as UCX). The OACISS has also generously offered to share access to a future MI300A system to be provided by AMD.

      While the Debian AI Team could still use more funding for their CI systems, AMD has provided tens of thousands of dollars worth of hardware to support this effort (and there are a number of pending donations that are not yet listed too).
      Last edited by cgmb; 27 May 2024, 09:27 PM.

      Comment


      • #23
        Originally posted by cgmb View Post

        That guy is me. I can't add anything to the linked thread, but I should note that in response to this request for donations, AMD provided an MI210 server to the Debian AI team to enable testing on CDNA 2 hardware. The Oregon Advanced Computing Institute for Science and Society (OACISS) is hosting the server (which has been named Pinwheel). It was set up just before the Debian import freeze for Ubuntu 24.04, which provided enough time for it to be used in adding ROCm support to a couple of packages found in that release (such as UCX). The OACISS has also generously offered to share access to a future MI300A system to be provided by AMD.

        While the Debian AI Team could still use more funding for their CI systems, AMD has provided tens of thousands of dollars worth of hardware to support this effort (and there are a number of pending donations that are not yet listed too).
        If you need a Polaris graphic card, I have a spare Sapphire Pulse Radeon RX 560 4 GB to give away.

        Comment


        • #24
          Originally posted by agd5f View Post
          AMD has been actively working with distros to include ROCm enabled across a wide range of GPUs via native packages. For example, ROCm for Fedora (https://fedoraproject.org/wiki/Changes/ROCm6Release) and debian (https://apt.rocm.debian.net/).
          Note that the repository described at https://apt.rocm.debian.net mostly contains enhancements to the Debian continuous integration infrastructure for extending it with an understanding of GPU architectures. It's really only needed for users that are setting up CI nodes. The actual ROCm packages are available directly in Debian Unstable / Testing / Stable.

          Originally posted by timofonic View Post
          AMD: Please hire him!
          FWIW, there are a couple of recently opened job posting that are at least partly for ROCm packaging and build rules, such as https://careers.amd.com/careers-home/jobs/48814.

          Comment


          • #25
            Would love it if these developers just stopped with the whole NIH syndrome, and instead just learned about regular distro package management. This would be a much better fit in a third party deb/rpm/makepkg repository.

            Comment


            • #26
              Originally posted by cgmb View Post

              That guy is me. I can't add anything to the linked thread, but I should note that in response to this request for donations, AMD provided an MI210 server to the Debian AI team to enable testing on CDNA 2 hardware. The Oregon Advanced Computing Institute for Science and Society (OACISS) is hosting the server (which has been named Pinwheel). It was set up just before the Debian import freeze for Ubuntu 24.04, which provided enough time for it to be used in adding ROCm support to a couple of packages found in that release (such as UCX). The OACISS has also generously offered to share access to a future MI300A system to be provided by AMD.

              While the Debian AI Team could still use more funding for their CI systems, AMD has provided tens of thousands of dollars worth of hardware to support this effort (and there are a number of pending donations that are not yet listed too).
              Thanks for the update. And while it's nice that AMD has donated GPUs as you already mentioned in that thread, you'll understand that we consumers buying AMD cards still find it mind boggling that AMD isn't directly funding all the infrastructure needed to make ROCm competitive. They essentially can't overspend on this. There's an AI gold rush going on. I'd rather AMD (and Intel) grab a bigger slice of that pie than just settle for the scraps leftover when enterprise customers find out they can't get enough GPUs from NVIDIA so they have to look elsewhere.

              Comment


              • #27
                Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post

                Thanks for the update. And while it's nice that AMD has donated GPUs as you already mentioned in that thread, you'll understand that we consumers buying AMD cards still find it mind boggling that AMD isn't directly funding all the infrastructure needed to make ROCm competitive. They essentially can't overspend on this. There's an AI gold rush going on. I'd rather AMD (and Intel) grab a bigger slice of that pie than just settle for the scraps leftover when enterprise customers find out they can't get enough GPUs from NVIDIA so they have to look elsewhere.
                I agree on some of your points but sometimes, you simply need time.

                When was cuda launched and when was rocm launched?

                Also remember, before rocm, AMD backed an open standard (like the good guys they are) that was opencl, which ngreedia proceeded to poorly support, so they could push another one of their proprietary crap (cuda).

                So AMD started from zero with rocm.

                Anyways, all these posts always end in the non stop trashing of AMD.
                Last edited by NeoMorpheus; 28 May 2024, 10:05 AM.

                Comment


                • #28
                  Yet with CUDA it comes with the drivers, AMD need to get their act together to make it simple as a Nvidia install.

                  Fanboys pipe down, there is an RX6600 in my desktop.

                  Comment


                  • #29
                    Originally posted by emansom View Post
                    Would love it if these developers just stopped with the whole NIH syndrome, and instead just learned about regular distro package management. This would be a much better fit in a third party deb/rpm/makepkg repository.
                    Make distro packaging less of a giant pain in the ass, and maybe they would. Both deb and rpm are awful to deal with. Flatpak is hardly better, but at least it's distro agnostic unlike the first 2.

                    Comment


                    • #30
                      Originally posted by finalzone View Post

                      If you need a Polaris graphic card, I have a spare Sapphire Pulse Radeon RX 560 4 GB to give away.
                      Thank you for the offer. While I don't have any Polaris 21 GPUs like the RX 560 card you're offering, I do have plenty of Fiji, Polaris 10, and Polaris 20 cards available. I think Debian would benefit more from users with those pre-Vega GPUs running the tests themselves, filing good bug reports, and helping to debug the issues found.

                      The Debian AI Team maintains a hardware wishlist. At the moment, I think the most useful hardware donations would be high-performance ECC RAM.

                      Comment

                      Working...
                      X