Announcement

Collapse
No announcement yet.

AMD Preparing ROCm 6.1 For Release With New Features

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by DiamondAngle View Post

    Idk thats not really true, theres a lot of stuff like https://github.com/ROCm/rocBLAS/blob...handle.cpp#L76 that calls out specific llvm targets in various rocm libraries. Theres also alot of custom kernels that are asm and device specific that exist only for supported chips, sometimes with no generic fallback like in ck. Sure sometimes there IS a generic fallback, but not always. Theres also the issue that since you have to compile everything for each llvm target and that unsupported targets arnt compiled for by amd or distro packages often, leaving those users to compile the entire rock stack, which is certenly a journey that really is not easy - i speak from expirance.
    It's not perfect, but in general, the major level GFX ISA levels are compatible so forcing a particular GFX ISA level should work if there is no specific support for the specific GFX ISA version in your chip. But to the larger original point, it's a huge amount of work to implement and validate all of these, hence the more limited set of officially "supported" parts.

    Comment


    • #12
      Originally posted by agd5f View Post

      It's not perfect, but in general, the major level GFX ISA levels are compatible so forcing a particular GFX ISA level should work if there is no specific support for the specific GFX ISA version in your chip. But to the larger original point, it's a huge amount of work to implement and validate all of these, hence the more limited set of officially "supported" parts.
      So how Nvidia manages it? Maybe that's their competitive advantage!

      Comment


      • #13
        I regularly post about my disdain for both NVIDIA and Intel, derived not just because they're closed source, but for historic corporate malfeasance dating back to the late 1970s in the case of Intel. In fact I would never purchase anything from NVIDIA or Intel, and will only build my personal systems with AMD components, sticking with them even through the disastrous Bulldozer and GPU era.

        But that doesn't mean I'm a fanboy, and will say things in their defense that are simply not true.

        And the truth is that AMD is sorely lacking in their ROCm performance and support, and have no GPU GUI at all. While NVIDIA excels in both their CUDA performance and support, and have a fully functional GPU GUI.

        This is something that AMD must confront and remedy. As, despite their claims, it is critical to their success and ability to surpass NVIDIA on all fronts. But so long as they continue to live in denial, and make endless excuses for not addressing these issues, they will remain forever behind. And I hate to see that, and know that if Jerry Sanders were still in control he would not allow such complacency.

        Comment


        • #14
          Originally posted by agd5f View Post

          The support is there in the compiler and most libraries for all of our GPUs. [...] most users get mesa from their distro, so it generally just works [...] We've been working extensively with Debian [...]
          Debian: https://salsa.debian.org/rocm-team, https://lists.debian.org/debian-ai/
          Does that mean that I can just issue `apt-get install rocm` or something easy like that on my Debian Sid root prompt and have ROCm working to some extent even on my Radeon RX 5500 XT 8G?

          agd5f If the answer is "yes, absolutely", then you made my day. If the answer is anything other than "yes, absolutely", then you know what we mean when we ask "when AMD will support all consumer cards, as Nvidia does". We do not mean "support" as in "AMD officially supports its ROCm releases on the Radeon RX 5500 XT", but as in "distro packaged ROCm is easy to install on all major distros and it runs to some extent even on the Radeon RX 5500 XT, despite AMD not officially supporting that". But I still hope the answer is the former, even if that would mean I failed to find the correct command on Google several times.

          Comment


          • #15
            Originally posted by lucrus View Post

            Does that mean that I can just issue `apt-get install rocm` or something easy like that on my Debian Sid root prompt and have ROCm working to some extent even on my Radeon RX 5500 XT 8G?
            Yes. That is the idea.

            Comment


            • #16
              Originally posted by agd5f View Post

              Yes. That is the idea.
              And when this idea is going to materialize, more or less?

              Comment


              • #17
                Originally posted by lucrus View Post

                And when this idea is going to materialize, more or less?
                It's been available for a while. E.g., https://apt.rocm.debian.net/

                Comment


                • #18
                  Almost all RDNA1+ Cards just work in ROCm Nowerdays, if you compile it from source + compile your machine learning framework based on that build as well...

                  I think AMD should just enable all possible GPU architectures in their official builds for ROCm, Tensile Lib in MIOpen, that would make a lot of people happy!
                  I am not saying that AMD should "support" it officially, just enable it in the builds and don´t claim any official support or feature completnes..
                  If there is someone in product marketing or a manager that blocks this, due to fear that this would lead to support cases, maybe just hack arround it, by offering non supported preview builds or something like that or CI-Builds which aren´t official releases with everything enabled.
                  Another option could be to hide it behind a environement variable flag like "ROCM_ENABLED_ARCHITECTURES=gfxXXXX,gfxYYYY" or something like that.

                  Same thing with most ML Frameworks.. The AMD packages or event the semi-official packages don´t have the consumer card architectures enabled by default.

                  People will then use HSA_OVERRIDE and somehow get it working half-baked and be angry once it breaks with a new release which uses features of the newer card it was built for.. IMHO this leads to much more frustration, as there will be strange bug / segfaults / crashes + worse performance..

                  Comment


                  • #19
                    This is a reddit post from a person who claims to be an nvidia employee, and while I know nothing about SW development their post makes sense to me, anyone from AMD have any thoughts of this?

                    "IMO the biggest problem with ROCm and what puzzles me the most is why AMD is enabling "official" support for ROCm on a per-chip basis.

                    As an example, from https://rocm.docs.amd.com/projects/i...uirements.html, ROCm is "officially" supported only on 7900XTX and 7900XT in RDNA3's dGPU lineup.

                    Now the confusing part for me here is all of RDNA3's SKUs would be considered the same from a SW / driver perspective (from the 7600XT or whatever the lower end RDNA3 SKU is to the 7900XTX). So, all of them would be running the same drivers. But AMD seems to take up the hardship of enabling ROCm on a per SKU basis.

                    Why? Why not just "officially" support ROCm for all of RDNA3's SKUs? They're all literally the same (from a SW perspective) and use the same drivers.

                    And let's not even talk about how architectures like Vega and even ruddy RDNA2 (launched in 2020 mind you) are not even present on the list!!? So GPUs barely 3 years old can NOT run ROCm. Bruh.

                    I think the latter point (Vega/RDNA2 not supporting ROCm) is hinting at AMD seemingly using completely different branches or driver source codes for their architectures.

                    This is a support / maintenance and release nightmare. (See how AMD bifurcates its dGPU driver releases across Polaris/Vega and RDNA by releasing architecture specific drivers (Polaris/Vega only driver - https://www.amd.com/en/support/kb/re...1-polaris-vega)

                    Compare this with Nvidia's support for CUDA and their drivers. All of Nvidia's GPUs from the first CUDA capable G8x to the latest Ada RTX 40 series GPUs irrespective of the SKU support CUDA. This enables any and every Nvidia user (from your casual GTX 1050 laptop gamer to your RTX 4090 AI hobbyist) to delve into CUDA. That creates a footprint for the technology and the mindshare.

                    And Nvidia's drivers are not split across architectures. One common driver codebase for all chips that Nvidia supports.

                    And any driver release will work on all supported architectures. No need for customers to hunt for architecture specific drivers.

                    Both AMD and Nvidia's drivers are C/C++ based. There's no reason for AMD to bifurcate drivers across architectures.

                    Just use preprocessor macros in your common driver codebase for arch specific code.

                    Unless AMD is constantly refactoring their driver architecture that disallows them from using such an approach (Which AMD is known to do btw. Read up on their shenanigans with their OpenGL driver where they rearchitected the entire driver only for it to cause more issues:

                    https://old.reddit.com/r/Amd/comment...vers_20042007/).

                    So yeah, TLDR: Having ROCm support on all relevant SKUs on AMD would be a really great move for AMD to challenge CUDA"


                    Attached Files

                    Comment


                    • #20
                      I'm interested in ROCm as alternative to NV+CUDA since a long time. 2022 got access to ACP AMD internal cloud to test PyTorch scalability for a project on Mi250X, working flawlessly.

                      I have searched and discovered AI platforms companies starting to adopt ROCm + AMD accelerators, and I'm in contact with them just for this reason.

                      I didn't want to criticize any personal ideas, just adding my opinion after reading since *years* people complaining always on the same topic on every ROCm release.

                      AI margins are huge so far due also AI hardware is supply constrained. Why a follower like AMD will be willing to compromise his hard earned margins, allocating resources to validate his AI enabling sw stack so to promote low margin hardware, like validating and supporting past generation/consumer GPU? To push higher the request for silicon that will compromise the tight supply for Nx more valuable Professional/Enterprise segment? This will just make the game of his competitors, suicidal for every company. Cash in now if you can, so you can invest later for volumes-driven markets.

                      ​​I will prefer being able to run flawlessly a cheap RX 7800 for experiments? Run in prod a W7800 Pro? Of course! But so far I just tested on the only AMD GPU cloud I was able to find, and asking that professional companies for development/production.

                      I NEED a local GPU for ML/AI tech evaluation? I have an NV one. It's just reasonable due less troubleshooting. If I want AMD hw I can get it cheap and unsupported, with the right skills I'm sure will work even if with some limitations.

                      AMD will be on par up sooner or later on the software side, with the right strategy and execution. Has already progressed a lot.

                      Comment

                      Working...
                      X