Announcement

Collapse
No announcement yet.

AMD Launches The Accelerator Cloud To Try Out EPYC CPUs, Instinct GPUs + ROCm

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Launches The Accelerator Cloud To Try Out EPYC CPUs, Instinct GPUs + ROCm

    Phoronix: AMD Launches The Accelerator Cloud To Try Out EPYC CPUs, Instinct GPUs + ROCm

    AMD has made public the AMD Accelerator Cloud. No, they aren't getting into the cloud game per se, but rather allowing a place for customers to try out new EPYC processors and AMD Instinct accelerators running with the latest ROCm software components...

    https://www.phoronix.com/scan.php?pa...elerator-Cloud

  • #2
    I truly think that packaging properly ROCm in Debian can have orders of magnitude more impact on the success of the platform that this cloud endeavor.
    The engineers are productive when they have something at hand, easily setup.

    But it's a platform, so all options to promote it are worth it I suppose.

    Comment


    • #3
      Originally posted by Maxzor View Post
      I truly think that packaging properly ROCm in Debian can have orders of magnitude more impact on the success of the platform that this cloud endeavor.
      The engineers are productive when they have something at hand, easily setup.

      But it's a platform, so all options to promote it are worth it I suppose.
      I agree!
      Having it properly packaged in Debian and Ubuntu with all its derivatives would be much more helpful.
      Hopefully it will get there one day!

      Comment


      • #4
        Originally posted by Danny3 View Post

        I agree!
        Having it properly packaged in Debian and Ubuntu with all its derivatives would be much more helpful.
        Hopefully it will get there one day!
        That seems pretty far away. Even if everything was reorganized and upstreamed tomorrow, we still have to wait for it to trickle down to a stable Ubuntu/Debian release

        Comment


        • #5
          Originally posted by Danny3 View Post

          I agree!
          Having it properly packaged in Debian and Ubuntu with all its derivatives would be much more helpful.
          Hopefully it will get there one day!
          I disagree. Outside of the two supercomputer wins, virtually nobody has CDNA based accelerators. This leads to a chicken and the egg situation when it comes to CDNA. You will not purchase the hardware unless you know it will work well in your workflow. However, there is nowhere to test your workflow on CDNA based hardware. Having ROCm easily available on Debian would not fix this problem. The cloud instance however, does fix the evaluation problem at least.

          For Nvidia, with the exception of Pascal/Volta, their compute cards have used the same architecture as their consumer graphics cards. This has made it relatively easy to evaluate workflows for the Tesla/Quadro/professional GPUs.

          Edit: I would love to see ROCm mainlined into Mesa, but I just don't think that would solve any problems with respect to the CDNA Instinct Cards.
          Last edited by Vlad42; 16 December 2021, 03:37 PM.

          Comment


          • #6
            Originally posted by Vlad42 View Post

            I disagree. Outside of the two supercomputer wins, virtually nobody has CDNA based accelerators. This leads to a chicken and the egg situation when it comes to CDNA. You will not purchase the hardware unless you know it will work well in your workflow. However, there is nowhere to test your workflow on CDNA based hardware. Having ROCm easily available on Debian would not fix this problem. The cloud instance however, does fix the evaluation problem at least.

            For Nvidia, with the exception of Pascal/Volta, their compute cards have used the same architecture as their consumer graphics cards. This has made it relatively easy to evaluate workflows for the Tesla/Quadro/professional GPUs.

            Edit: I would love to see ROCm mainlined into Mesa, but I just don't think that would solve any problems with respect to the CDNA Instinct Cards.
            Unfortunately, this kinda rings true. We can dream about AMD compute on desktop but AMD itself has decided that it wants to segment that away only to hyperscalers. And the hyperscalers don't care about AMD compute because the software they host inevitably first germinated on someone's desktop, where AMD compute doesn't exist.

            Now the ROCm division is spending all its time basically writing custom software for a few once-off supercomputer accounts, instead of democratising compute to where it would matter for the long term. I mean honestly. I'm literally running basically full-blown Cuda 10 on a 99 dollar device. Nvidia just gets this.

            All so ironic because even now, but certainly in years gone by, AMD had the best compute. GCN was a compute beast. There should be a business school study on AMD GPU Compute that demonstrates, with all the market share evidence, how precisely NOT to run a division. AMD's acquisition of ATi Technologies is one of the great fails of computer business history.
            Last edited by vegabook; 16 December 2021, 04:56 PM.

            Comment


            • #7
              So there's some interesting developments, apparently, when it comes to expanded ROCm support.
              I'd heard some rumours that it might finally be getting expanded which would make sense in light of AMD wanting to make their ROCm OpenCL driver the official one. I have a Radeon VII on my dev PC and thought I was stuck with that BUT I DO have a 6900XT in the gaming PC. So a couple of days ago I went, added "gfx1030" to my list of CMAKE_HIP_ARCHITECTURES, built some toyBrot and it just ran. That was pretty great, especially considering how powerful that card is.

              Funny thing is there seems to be no outward acknowledgement of this and, last I checked, the docs for ROCm still said "Vega and some Polaris", which is weird. I still think that one of the biggest challenges of ROCm compared to the nVidia stack is that anyone can grab any recent-ish laptop or old gaming card and fire some CUDA up, super simple. But not only ROCm is still a mess to install a lot of the time (it got so much better with 4.5) but there is still this hardware support thing. Big "Huge if true" moment here, imo

              Receipts attached (and a couple of bananas for scale).

              6900XT
              https://nextcloud.vilelasagna.ddns.n...xDKTi8z5Ro6E8n

              RadeonVII
              https://nextcloud.vilelasagna.ddns.n...obAMX9mXMe35sB

              A100
              https://nextcloud.vilelasagna.ddns.n...fWKxnfeBtJ5Ags

              Comment


              • #8
                Originally posted by vegabook View Post

                Unfortunately, this kinda rings true. We can dream about AMD compute on desktop but AMD itself has decided that it wants to segment that away only to hyperscalers. And the hyperscalers don't care about AMD compute because the software they host inevitably first germinated on someone's desktop, where AMD compute doesn't exist.

                Now the ROCm division is spending all its time basically writing custom software for a few once-off supercomputer accounts, instead of democratising compute to where it would matter for the long term. I mean honestly. I'm literally running basically full-blown Cuda 10 on a 99 dollar device. Nvidia just gets this.

                All so ironic because even now, but certainly in years gone by, AMD had the best compute. GCN was a compute beast. There should be a business school study on AMD GPU Compute that demonstrates, with all the market share evidence, how precisely NOT to run a division. AMD's acquisition of ATi Technologies is one of the great fails of computer business history.
                There are a lot of hyperscaler workloads for GPGPU compute, so it is not necessarily a bad place to start. Those hyperscalers (Google, Facebook, the supercomputer wins, etc.) are more likely to have the resources to help develop the tools and frameworks needed not only for their own internal workloads, but for the non-hyperscalers as well. Until those tools and frameworks are in place, it does not make sense to target desktops/laptops unless we just want another GCN situation - and there were a lot more of them available on desktops and laptops than CDNA and RDNA 1&2 combined.

                Having the hyperscalers adopt AMD's compute cards could help give confidence to the smaller organizations similar to the Epyc rollout. Having read the articles here on Phoronix, AMD has been adding OpenCL extensions, performance optimizations, increasing tool/framework compatibility, and adding support for new CDNA based hardware. With the exception of the CDNA based hardware support and some of the hyperscaler infrastructure focused performance optimizations, the other improvements should help consumer GPUs if AMD ever adds support and makes it easily available. There is no question that Nvidia has had a head start on these kinds of hyperscaler focused optimizations and AMD is just catching up.

                I do agree, AMD really does need to add support for RDNA 1&2 to ROCm. They need to compliment the increasing framework/tool support and hyperscaler adoption of CDNA with desktop and laptop RDNA compute availability. I do understand their priorities and why they have not added RDNA support yet; most compute workloads that need big accelerators are run on servers after all.

                With respect to Nvidia and GCN, Nvidia had superior compute hardware back when GPGPU compute started with the 8 series through Fermi, the 400/500 series. That was four generations with a significant advantage, during which they locked the ecosystem into CUDA. Unfortunately, that was all developers were interested except for Apple with OpenCL. So, GCN was screwed by a lack of developer support when it came to compute.

                Comment


                • #9
                  Originally posted by Vlad42 View Post
                  I disagree. Outside of the two supercomputer wins, virtually nobody has CDNA based accelerators. This leads to a chicken and the egg situation when it comes to CDNA. You will not purchase the hardware unless you know it will work well in your workflow. However, there is nowhere to test your workflow on CDNA based hardware. Having ROCm easily available on Debian would not fix this problem. The cloud instance however, does fix the evaluation problem at least.
                  The packaging initiatives are not just about the CDNA cards - we use a subset of the ROCm stack for compute on consumer GPUs as well. We had to prioritize CDNA support in order to meet some contractual obligations but we have the lower level code for RDNA1/2 in place now (we use it for OpenCL in the AMDGPU-PRO packages). The upper level code (math libraries etc...) is in place for RDNA2 other than a bunch of testing and fixing although we still have some work to do in the libraries for RDNA1.

                  We were pretty close to adding RDNA2 to the official support list in the most recent ROCm stack release but decided to hold off until we fixed more of the open issues. The code was published though and people are using it on RDNA2 today, albeit with some tweaking required.

                  In terms of CDNA hardware availability, I think general availability of MI200 is not quite here yet (I'll check) so it's just MI100 for now, and my impression is that nearly all of those are sold bundled into servers rather than as individual cards. Once MI200 ships to the broader market I was going to take another look and start asking questions internally if it seemed to hard to buy them.
                  Test signature

                  Comment


                  • #10
                    Originally posted by VileLasagna View Post
                    So a couple of days ago I went, added "gfx1030" to my list of CMAKE_HIP_ARCHITECTURES, built some toyBrot and it just ran. That was pretty great, especially considering how powerful that card is.

                    Funny thing is there seems to be no outward acknowledgement of this and, last I checked, the docs for ROCm still said "Vega and some Polaris", which is weird. I still think that one of the biggest challenges of ROCm compared to the nVidia stack is that anyone can grab any recent-ish laptop or old gaming card and fire some CUDA up, super simple. But not only ROCm is still a mess to install a lot of the time (it got so much better with 4.5) but there is still this hardware support thing. Big "Huge if true" moment here, imo
                    Right - we have not yet added RDNA2 to the list of officially supported hardware, although I believe the latest release notes do mention a specific Navi21 workstation card. We were close to adding official support in the most recent release but decided to hold off and fix some more issues first.

                    We still have some work to do in order to make ROCm/AMD compute solutions as broadly available as NVidia's offerings, but we have been working on it and are getting a lot closer.
                    Test signature

                    Comment

                    Working...
                    X