Announcement

Collapse
No announcement yet.

AMD Launches The Accelerator Cloud To Try Out EPYC CPUs, Instinct GPUs + ROCm

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by vegabook View Post
    Unfortunately, this kinda rings true. We can dream about AMD compute on desktop but AMD itself has decided that it wants to segment that away only to hyperscalers.
    Nope... we had limited resources and decided to focus on large datacenters first. Now we are catching up on the rest of the market.

    Originally posted by vegabook View Post
    Now the ROCm division is spending all its time basically writing custom software for a few once-off supercomputer accounts
    That was largely the case a couple of years ago (if you add large data center & hyperscaler customers to supercomputer accounts) but not today.
    Last edited by bridgman; 16 December 2021, 08:09 PM.
    Test signature

    Comment


    • #12
      Originally posted by Vlad42 View Post

      There are a lot of hyperscaler workloads for GPGPU compute, so it is not necessarily a bad place to start. Those hyperscalers (Google, Facebook, the supercomputer wins, etc.) are more likely to have the resources to help develop the tools and frameworks needed not only for their own internal workloads, but for the non-hyperscalers as well. Until those tools and frameworks are in place, it does not make sense to target desktops/laptops unless we just want another GCN situation - and there were a lot more of them available on desktops and laptops than CDNA and RDNA 1&2 combined.

      Having the hyperscalers adopt AMD's compute cards could help give confidence to the smaller organizations similar to the Epyc rollout. Having read the articles here on Phoronix, AMD has been adding OpenCL extensions, performance optimizations, increasing tool/framework compatibility, and adding support for new CDNA based hardware. With the exception of the CDNA based hardware support and some of the hyperscaler infrastructure focused performance optimizations, the other improvements should help consumer GPUs if AMD ever adds support and makes it easily available. There is no question that Nvidia has had a head start on these kinds of hyperscaler focused optimizations and AMD is just catching up.

      I do agree, AMD really does need to add support for RDNA 1&2 to ROCm. They need to compliment the increasing framework/tool support and hyperscaler adoption of CDNA with desktop and laptop RDNA compute availability. I do understand their priorities and why they have not added RDNA support yet; most compute workloads that need big accelerators are run on servers after all.

      With respect to Nvidia and GCN, Nvidia had superior compute hardware back when GPGPU compute started with the 8 series through Fermi, the 400/500 series. That was four generations with a significant advantage, during which they locked the ecosystem into CUDA. Unfortunately, that was all developers were interested except for Apple with OpenCL. So, GCN was screwed by a lack of developer support when it came to compute.
      Look I can see that AMD has made some effort on OpenCL, my scepticism on that technology notiwthstanding. Moreover perhaps WGPU will require AMD to provide browser-based (and also through wgpu-rs therefore, native), GPU compute. And there's Vulkan compute. So there's plenty of hope.

      That said I work constantly for a bunch of finance guys doing statarb in both bonds and crypto all day long here in London. Nobody ever saw an AMD compute card anywhere in that very big market. I can vouch for that myself.

      So maybe the hyperscaler strategy pays off, but I can't help thinking that even TensorFlow/Torch/Jax etc were once a dream on someone's desktop and they reached out for, you guessed it, what was _available_, namely CUDA. Now sure, these guys don't want lockin at their scale so the AMD strategy makes some sense, I suppose. But it still needs to get this stuff onto desktop sharpish because the Next Big Thing, the creator market, which defo uses desktop compute, is running away from them, just when everybody is starting to sprinkle the "metaverse" into their business.

      Also Edge. I'd buy an ROCm capable Jetson-like low-end AMD board tomorrow. Or AMD could support ROCm on the APUs and let third parties do it.
      Last edited by vegabook; 16 December 2021, 07:13 PM.

      Comment


      • #13
        Good idea, the test drive cloud.

        "If you qualify..." - well I suppose we'll see.

        Comment


        • #14
          bridgman Hiya, since you seem to be active around the topic could you route https://twitter.com/b_nieuwen/status...68442415325188 ? Would be nice to give a shot at supporting the CDNA cards in RADV without having to deal with unobtainium and noisy servers at home.

          Comment


          • #15
            Originally posted by bridgman View Post
            We still have some work to do in order to make ROCm/AMD compute solutions as broadly available as NVidia's offerings, but we have been working on it and are getting a lot closer.
            The effort is definitely showing and giving results. It's a hard task and, pain and whining aside, I, at least DO appreciate the work everyone involved is putting in. Exciting times ahead it seems =3

            Comment


            • #16
              Originally posted by BNieuwenhuizen View Post
              bridgman Hiya, since you seem to be active around the topic could you route https://twitter.com/b_nieuwen/status...68442415325188 ? Would be nice to give a shot at supporting the CDNA cards in RADV without having to deal with unobtainium and noisy servers at home.
              I have asked a couple of people involved with the project if they can point me to a support contact, and included a link to your tweet. Will let you know when I hear back.

              Did the error happen after hitting [REGISTER] on the New Account Registration form ?

              EDIT - email has been forwarded to a couple of additional people...
              Last edited by bridgman; 16 December 2021, 08:40 PM.
              Test signature

              Comment


              • #17
                Originally posted by bridgman View Post

                I have asked a couple of people involved with the project if they can point me to a support contact, and included a link to your tweet. Will let you know when I hear back.

                Did the error happen after hitting [REGISTER] on the New Account Registration form ?

                EDIT - email has been forwarded to a couple of additional people...
                No, before. flow is the community link from the article -> "Get Started and Request Access on the AAC Today" -> "Take a test drive now" -> "Register for an account with AMD". This then shows a checkbox with "I agree that I have read and understood the AMD Accelerator Cloud Terms of Use for this program." as part of the form. Before I'd check it (which I kinda assumed was necessary to make the "register" button work) I'd need to actually read the terms, but the link is broken.

                (edit: btw sorry if that came across as more urgent than it is. Maybe I'm the only one who actually tends to read these before agreeing)
                Last edited by BNieuwenhuizen; 16 December 2021, 09:24 PM.

                Comment


                • #18
                  Got it, thanks. I have updated the ticket and also explained that relatively few of our target users would be willing to click "I have read and understood..." without having actually read the terms of use. I certainly would not do it myself.
                  Test signature

                  Comment


                  • #19
                    Originally posted by bridgman View Post

                    Nope... we had limited resources and decided to focus on large datacenters first. Now we are catching up on the rest of the market.



                    That was largely the case a couple of years ago (if you add large data center & hyperscaler customers to supercomputer accounts) but not today.
                    In the meantime, you have alienated that most precious of resources. Your, and I'll say it, your FANBOIS. Of which I most certainly was one, right back to the 8514 Vantage. That might seem "old", but the principle applies well into the 2020s. That's how bad ROCm is.

                    This is the cost of AMD's disrespect for its retail compute clients. Who had so many hopes. And yes, I'm probably a minority and I'll get over it.

                    And despite people like me attacking AMD in no uncertain terms, you have always been respectful, which is to your credit. But at the same time, one cannot help thinking that you're party to the problem, by PR-ing no matter what.

                    The above is a classic example. You fudged for 2-3 years. Now all of a sudden it's "yeah sure, but we're on it now, don't worry". That was not your line before. It was always "don't worry be happy we're on it". Which you weren't, by your own admission.

                    You did NOT give us the clear-as-daylight picture, which was internally obviously happening, at the time. You were obfuscating. That costs people money, time, and most importantly, motivation, bridgman . This is serious stuff.

                    I suppose it's easy to figure out. You're paid by AMD. There's the problem.
                    Last edited by vegabook; 17 December 2021, 09:31 PM.

                    Comment


                    • #20
                      Originally posted by vegabook View Post
                      In the meantime, you have alienated that most precious of resources. Your, and I'll say it, your FANBOIS. Of which I most certainly was one, right back to the 8514 Vantage. That might seem "old", but the principle applies well into the 2020s. That's how bad ROCm is.

                      This is the cost of AMD's disrespect for its retail compute clients. Who had so many hopes. And yes, I'm probably a minority and I'll get over it.
                      I don't think you are a minority. The "consumer first or datacenter first" subject was debated hotly inside the company but the decision at the time was to focus entirely on datacenter.

                      I felt that we could maintain a focus on datacenter while still doing enough to keep the consumer/workstation customers supported, but that was very much a minority view at the time. I think there is a better understanding these days about the importance of grassroots developer support (which implies consumer products, not just workstation) but there is still more work to be done.

                      We did make another important internal change a couple of months ago - integrating the release processes for graphics and ROCm stacks to the point that we twin the releases and use the same ROCm code for both. What this should give us for the first time is testing on "GUI apps" (ie apps using compute/graphics interop) before releasing a ROCm stack. That is not manifesting as happy AMDGPU-PRO users yet but we are trying hard to pull our internal testing into alignment with what our customers are doing with the drivers.

                      Originally posted by vegabook View Post
                      And despite people like me attacking AMD in no uncertain terms, you have always been respectful, which is to your credit. But at the same time, one cannot help thinking that you're party to the problem, by PR-ing no matter what.

                      The above is a classic example. You fudged for 2-3 years. Now all of a sudden it's "yeah sure, but we're on it now, don't worry". That was not your line before. It was always "don't worry be happy we're on it". Which you weren't, by your own admission.

                      You did NOT give us the clear-as-daylight picture, which was clearly happening, at the time. You were obfuscating.
                      This part I don't agree with - I think I have always been pretty clear that we were working on RDNA support but I couldn't say when it would arrive because we we had to prioritize other work (the big datacenter parts) higher.

                      I'm not sure, but it seems like you think we did nothing for a couple of years and then suddenly got off our a**es and ported the entire compute stack to RDNA2 in a few weeks. That's the only explanation I can think of for you thinking I was obfuscating - or did I miss something ?

                      As far as I know the only change in what I have been saying is...

                      - we're working on it but I don't know how long it will take
                      - we're working on it and everything in the ROCm stack up to OpenCL is shipping as the compute solution for AMDGPU-PRO on RDNA and RDNA2
                      - we're working on it and RDNA2 is pretty close to official support right up to the ML frameworks (and people are using it today with tweaking)
                      (hopefully soon)
                      - we're working on it, we have official support for RDNA2, <this subset> works for RDNA, and we're fairly close to finishing RDNA

                      To me that's the same message modified only by the passage of time.

                      Originally posted by vegabook View Post
                      I suppose it's easy to figure out. You're paid by AMD. There's the problem.
                      Don't think so. It does mean that I don't come on here and rant about internal problems but it doesn't mean I say things that are either untrue or misleading. I don't get paid anywhere near enough to do that.
                      Last edited by bridgman; 17 December 2021, 09:44 PM.
                      Test signature

                      Comment

                      Working...
                      X