Announcement

Collapse
No announcement yet.

The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by gsedej View Post
    Please. For the 100th time:
    • ROCm is not for you, the open-source fan, who happened to have RX580 to run linux desktop and play some games. (Some) things might work, but not out of the box and on any given version of distro and kernel version.
    • ROCm is also not for you, beginner computer science student with AMD hardware (laptop/desktop), wishing to do some neural network curse in Caffe/Tensorflow/Torch. While it might work, you might spend 10x more time preparing working environment compared to your classmates.

    ROCm is targeted for enterprise like LUMI with dedicated workload and developer who will create/port code to work specifically on given hardware.

    ROCm being opensource is just AMD policy, and it's nice. Do not compare it to project like MESA, where multiple hardware (intel/amd/nouveau) and multiple organizations (e.g. AMD, intel, Valve, Collabora, etc) are contributing.
    Actually it works with a 580. It did up to at least 3.x. Caffe worked up to about 3.5 (but after that you can't even stand up Googlenet on a Radeon VII, FFS). But I agree with the other guy who said to stop being so condescending.

    Comment


    • #62
      Originally posted by vegabook View Post

      I'm sorry. Your company has been worth > 20 billion dollars on the stock market for more than 2 years now, and 50bn since the end of 2019. Stock market valuation's entire raison-d'etre is to measure your capital raising capacity. If your management decided not to take advantage of this trove of gold by, say, issuing 2% of shares into that valuation to fund software, then that's a strategic mistake for Radeon Technologies group or whatever it's now called, which seems to have been completely forgotten with all the Ryzen excitement. Fact is, money has been there for the taking thanks to your enormously successful stock but management didn't use that to invest where it needed to. This insistence on funding out of cashflow is very difficult to understand when your competitor is racing ahead with ever more speed, and Intel is about to get in too.
      Thanks for clarifying. If you had said "valuable" (and hence with more possibility of raising capital from the stock market than in the past) rather than "rich" I would have agreed. That only really applies in the last year though... I don't think our stockholders would have accepted going out to the market for enough money to make a difference when the valuation was only $20B, and if we went for something smaller the pressure would have been to invest in consumer GPUs or more CPU advance rather than in the ROCm stack.
      Last edited by bridgman; 23 February 2021, 02:42 PM.
      Test signature

      Comment


      • #63
        Originally posted by bridgman View Post
        ...
        Most of our major customers build from source, by the way.
        ...
        Jesus, that is pretty Goddamned rich, Bridgman.

        I think nothing of building the kernel for custom cases. I thought nothing of build X from source back in the day. I've been working with open source since 1993: I LIKE having the source code.

        But my God, ROCm. It's a stack with many parts - a couple dozen? I can't even find a description of the dependency tree of the packages in bloody ROCm. I've tried to build ROCm from source, and I know how to figure dependencies out, but ROCm defeated me.

        In case you think you hear some anger here, well, guy, you are right. AMD touts ROCm stack but it cannot be arsed to write up a doc that describes how to build the stack; AMD cannot be arsed to support RDNA/RDNA2 except with OpenCL (like, you know, OpenCL is the future, right?); AMD cannot even be arsed to write installation instructions that do not leave people with the impression they should install rocm-dkms; AMD puts bloody GCN GPUs inside CPUs and calls them APUs but AMD cannot be arsed to support it's compute stack there either.

        How the hell does AMD ever expect to compete with nVidia in GPGPU outside national supercomputer walled gardens when the people who might become a real ecosystem for ROCm can't even use it? Radeon VII is gone, and you can't buy an mi60, you can buy an mi50 if you agree to felate a corp sales rep for a while, and the mi100 requires relations with HPE/Cray, a fate worse than corp sales; AMD should stop talking about ROCm in public, because the only way the public can actually use it now is to somehow find an RX580.

        Comment


        • #64
          Originally posted by bridgman View Post
          Minor nitpick - I think it's actually Radeon Open Compute platforM.
          Actually it appears that the name might have been changed (to something like Radeon Open eCosysteM) after all due to trademark concerns.
          Test signature

          Comment


          • #65
            Originally posted by hoohoo View Post
            Jesus, that is pretty Goddamned rich, Bridgman.

            I think nothing of building the kernel for custom cases. I thought nothing of build X from source back in the day. I've been working with open source since 1993: I LIKE having the source code.

            But my God, ROCm. It's a stack with many parts - a couple dozen? I can't even find a description of the dependency tree of the packages in bloody ROCm. I've tried to build ROCm from source, and I know how to figure dependencies out, but ROCm defeated me.

            In case you think you hear some anger here, well, guy, you are right. AMD touts ROCm stack but it cannot be arsed to write up a doc that describes how to build the stack; AMD cannot be arsed to support RDNA/RDNA2 except with OpenCL (like, you know, OpenCL is the future, right?); AMD cannot even be arsed to write installation instructions that do not leave people with the impression they should install rocm-dkms; AMD puts bloody GCN GPUs inside CPUs and calls them APUs but AMD cannot be arsed to support it's compute stack there either.
            Again, my comment was specifically in response to a poster's comments about Google. It was not meant to be dismissive in any way about the challenges new users face, but if you take it out of context it could read that way.

            I have raised the issue about it being almost impossible to find a new product with ROCm support multiple times, and it does seem to have gotten attention. We also have new management in most of the areas where you expressed concerns and I think that will help as well. It still requires a bunch of additional people to do the work and that will take some time to build up, but it looks promising.

            Originally posted by hoohoo View Post
            How the hell does AMD ever expect to compete with nVidia in GPGPU outside national supercomputer walled gardens when the people who might become a real ecosystem for ROCm can't even use it? Radeon VII is gone, and you can't buy an mi60, you can buy an mi50 if you agree to felate a corp sales rep for a while, and the mi100 requires relations with HPE/Cray, a fate worse than corp sales.
            I know it doesn't help much, but one of the reasons for launching the Radeon VII Pro was to make sure we had a card with fast compute and good ROCm stack support out there. It's not a full solution by any means but it's a start.

            Originally posted by hoohoo View Post
            AMD should stop talking about ROCm in public, because the only way the public can actually use it now is to somehow find an RX580.
            In fairness, we never did talk about it much in public outside of supercomputing and datacenter contexts, which is where both the HW and SW focus were initially concentrated, and we don't plan to until we make it more accessible.
            Last edited by bridgman; 23 February 2021, 02:36 PM.
            Test signature

            Comment


            • #66
              Originally posted by bridgman View Post

              With respect, we made zero deal about Radeon VII's FP64 capability other than eventually adding an FP64 FLOPs number to the product page alongside FP16 and FP32.

              The press made a big deal about it which is fine, and which eventually resulted in us launching the PRO version of the card (with even faster FP64) but from our perspective Radeon VII was a gaming card.
              And therein lies AMD's failure in this market. Everyone else - which means AMD's customers - looked at the Radeon VII as an entry level compute card.

              It doesn't stand the smell test. If AMD meant the Radeon VII to be a gaming card then why did AMD cap the FP64 at 3.5-odd TF and not gimp it to near zero like the other gaming cards?

              Comment


              • #67
                Originally posted by bridgman View Post

                Again, my comment was specifically in response to a poster's comments about Google. It was not meant to be dismissive in any way about the challenges new users face.

                I have raised the issue about it being almost impossible to find a new product with ROCm support multiple times, and it does seem to have gotten attention. We also have new management in most of the areas where you expressed concerns and I think that will help as well. It still requires a bunch of additional people to do the work and that will take some time to build up, but it looks promising.
                Feel free to pass my eloquent post, verbatim, to Lisa Su and whoever runs the GPU division now.

                Comment


                • #68
                  Originally posted by vegabook View Post
                  Second, this going on about "limited resources". AMD is rich now. It's been rich for 2.5 years at least. This excuse is no longer valid at all even if you allow for decent lead time to hire and train people.
                  didn't you read what bidgman said? the ROCm problem today comes from the lag of money 4 years ago.

                  and right now they try to fix it with today's money. what does not bring you back the time from 4 years ago.
                  Phantom circuit Sequence Reducer Dyslexia

                  Comment


                  • #69
                    Originally posted by bridgman View Post

                    It uses an NVidia/Mellanox API (which is what the NICs support) that requires application memory pages to be pinned in place from userspace. I think the recent upstreaming of GPU P2P dmabuf may allow us to re-implement in an upstreamable form, at least that was one of the goals for the work IIRC.
                    nice to hear that we will have upstram version of it in near future by re-implement it with DMABUF.
                    Phantom circuit Sequence Reducer Dyslexia

                    Comment


                    • #70
                      Originally posted by Qaridarium View Post
                      nice to hear that we ^^^^may have upstram version of it in near future by re-implement it with DMABUF.
                      Fixed that for you
                      Test signature

                      Comment

                      Working...
                      X