Announcement

Collapse
No announcement yet.

The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by billyswong View Post
    ROCm's attitude of "we don't care desktop use case" is...
    That'a just a plain false statement when the reality is that AMD focuses more of their limited resources on HPC than on desktop ones. The ROC project has never had the resources that Nvidia has put into their CUDA implementations and getting into GPGPU compute the effort it requires to set up ROC more or less inconsequential. Most of the time that you're going to be spending will be into learning the APIs themselves and the sometimes vastly different way code needs to be set up to take effective use of the hardware.

    Sure, it may be a pain in the neck if you're one of those people expecting an Apple-esqe "it just works" experience, but those people are not the more serious users ROC is actually aimed at. We're not talking about hobbyists who want to spend an afternoon or two playing around with it, we're talking about professionals who can spend weeks just evaluating something.

    Comment


    • #52
      Originally posted by L_A_G View Post

      That'a just a plain false statement when the reality is that AMD focuses more of their limited resources on HPC than on desktop ones. The ROC project has never had the resources that Nvidia has put into their CUDA implementations and getting into GPGPU compute the effort it requires to set up ROC more or less inconsequential. Most of the time that you're going to be spending will be into learning the APIs themselves and the sometimes vastly different way code needs to be set up to take effective use of the hardware.

      Sure, it may be a pain in the neck if you're one of those people expecting an Apple-esqe "it just works" experience, but those people are not the more serious users ROC is actually aimed at. We're not talking about hobbyists who want to spend an afternoon or two playing around with it, we're talking about professionals who can spend weeks just evaluating something.
      There are two problems going on here. Yes ROCm is not for casual hobbyists, but for every 10 or 20 "prosumers" who actually want to try to do something serious, one of them will at some stage become a proper, senior, decision making HPC professional in the future. You're turning these people off if they recall what a mess it was when they were in the learning phase. There are zero HPC people who were not once beginners, and many of these will have begun as hobbyists.

      Second, this going on about "limited resources". AMD is rich now. It's been rich for 2.5 years at least. This excuse is no longer valid at all even if you allow for decent lead time to hire and train people.




      Comment


      • #53
        Originally posted by Qaridarium View Post
        we can say that there is compute only hardware in CDNA the 64bit floatpoint units...
        games don't use this games use 32bit or lower...

        i don't know any games who use 64bit fp...
        Sure, except MadeUpName's question was specifically about consumer cards. We don't have fast FP64 in our consumer cards, just 1/16th-rate using the FP32 logic.
        Test signature

        Comment


        • #54
          Originally posted by extremesquared View Post
          My main concern is that AMD leadership could take a step back, look at the market, and decide that they are comfortable permanently conceding compute to nvidia.
          Yep... we could, we did and we didn't. In that order.
          Test signature

          Comment


          • #55
            Originally posted by seesturm View Post
            Why is the RDMA part not upstreamable? Or what does RDMA mean in this context?
            It uses an NVidia/Mellanox API (which is what the NICs support) that requires application memory pages to be pinned in place from userspace. I think the recent upstreaming of GPU P2P dmabuf may allow us to re-implement in an upstreamable form, at least that was one of the goals for the work IIRC.
            Test signature

            Comment


            • #56
              Originally posted by vegabook View Post
              I'll remind you that AMD made a big deal out of Radeon VII FP64 capability so AMD clearly markets the compute capability of retail cards.
              With respect, we made zero deal about Radeon VII's FP64 capability other than eventually adding an FP64 FLOPs number to the product page alongside FP16 and FP32.

              The press made a big deal about it which is fine, and which eventually resulted in us launching the PRO version of the card (with even faster FP64) but from our perspective Radeon VII was a gaming card.
              Test signature

              Comment


              • #57
                Originally posted by vegabook View Post
                There are two problems going on here. Yes ROCm is not for casual hobbyists, but for every 10 or 20 "prosumers" who actually want to try to do something serious, one of them will at some stage become a proper, senior, decision making HPC professional in the future. You're turning these people off if they recall what a mess it was when they were in the learning phase. There are zero HPC people who were not once beginners, and many of these will have begun as hobbyists.
                Yep, agree completely. I don't think this has been sufficiently well understood at a management level but that is changing fairly quickly.

                Originally posted by vegabook View Post
                Second, this going on about "limited resources". AMD is rich now. It's been rich for 2.5 years at least. This excuse is no longer valid at all even if you allow for decent lead time to hire and train people.
                Sorry, I nearly snorted coffee out my nose when I read that. I wouldn't call us rich yet by a long shot but at least we have been solidly profitable (enough to sustainably fund growth) for a half year or so, and we have already ramped up hiring accordingly. Remember that half of our 2020 "profit" came from cancelling some kind of reserve we had been holding against future tax liabilities (I don't know the details) and didn't actually give us any money.

                We are not "rich" by any means yet - NVidia has three times our annual profits and five times our cash reserves for similar revenues, and it's those cash reserves relative to revenues (essentially how many days of business you have in cash) that allow you to make big investments without putting the company at too much risk. I didn't even try comparing to Intel since we are still bug-sized in comparison.

                We have been increasing R&D investment carefully since the Zen processors launched successfully though, almost doubling since 2016 ($1008M in 2016, $1983M in 2020), but we still have a ways to go.
                Last edited by bridgman; 23 February 2021, 10:45 AM.
                Test signature

                Comment


                • #58
                  Originally posted by bridgman View Post

                  Yep, agree completely. I don't think this has been sufficiently well understood at a management level but that is changing fairly quickly.



                  Sorry, I nearly snorted coffee out my nose when I read that. I wouldn't call us rich yet by a long shot but at least we have been solidly profitable (enough to sustainably fund growth) for a half year or so, and we have already ramped up hiring accordingly. Remember that half of our 2020 "profit" came from cancelling some kind of reserve we had been holding against future tax liabilities (I don't know the details) and didn't actually give us any money.

                  We are not "rich" by any means yet - NVidia has three times our annual profits and five times our cash reserves for similar revenues, and it's those cash reserves relative to revenues (essentially how many days of business you have in cash) that allow you to make big investments without putting the company at too much risk. I didn't even try comparing to Intel since we are still bug-sized in comparison.

                  We have been increasing R&D investment carefully since the Zen processors launched successfully though, almost doubling since 2016 ($1008M in 2016, $1983M in 2020), but we still have a ways to go.
                  I'm sorry. Your company has been worth > 20 billion dollars on the stock market for more than 2 years now, and 50bn since the end of 2019. Stock market valuation's entire raison-d'etre is to measure your capital raising capacity. If your management decided not to take advantage of this trove of gold by, say, issuing 2% of shares into that valuation to fund software, then that's a strategic mistake for Radeon Technologies group or whatever it's now called, which seems to have been completely forgotten with all the Ryzen excitement. Fact is, money has been there for the taking thanks to your enormously successful stock but management didn't use that to invest where it needed to. This insistence on funding out of cashflow is very difficult to understand when your competitor is racing ahead with ever more speed, and Intel is about to get in too.

                  Let's recall that Nivida, at 360bn or so is worth 3.5x more than AMD, and at least half of that is due to CUDA and the resultant datacentre market share. Surely your management must realise that this enormous premium that Nvidia enjoys is almost entirely due to Nvidia's software superiority. The discrepancy is so large that any failure to invest in software very urgently at AMD must surely be questioned. Entirely understandable in 2016. Much more difficult to understand post 2018.

                  Now it's true that Koduri seems to have been problematic and there may be internal problems that I have no clue about, but AMD most definitely has access to funds. LOTS of funds. Why it doesn't use that to plug this obvious hole in its strategy is beyond me.

                  And please - really nothing personal. I've always appreciated your responses in these forums. I'm also a huge AMD fan since the early 90s of the 8514-ultra. If I sound frustrated it's because I want your firm's success (and to use my Radeon VII !!) and have enormous respect for the engineers and developers within it.
                  Last edited by vegabook; 23 February 2021, 01:35 PM.

                  Comment


                  • #59
                    Keep in mind that compute isn't just for people modelling nuclear bombs or the universe. Nearly every graphics app ether wants it (darktable) or needs it (ressolve). Modeling software (blender) , engineering software hell even spreadsheets want compute. That is a pretty large body of people you are looking at there.

                    Originally posted by wallcarpet40 View Post

                    clinfo for the 6800XT --> https://pastebin.com/JXjihVCN
                    First let me say I am jealous that you were able to get your hands on a 6800XT card. So the good news is that it is able to talk to your hardware, sees your compiler chain and can run kernels. That may be enough to get F@H working if you don't need Image support. OpenCL says it supports it but the driver probably doesn't. An easy way to check is

                    install darktable
                    darktable -d opencl
                    look at the output
                    uninstall darktable

                    Your next step would be to try installing F@H and see if it has any way to check your opencl stack for what it wants. Good luck. Let us know how it works out.



                    Comment


                    • #60
                      Originally posted by bridgman View Post
                      In case it helps, the "limited to an older kernel" issue only applies if you are using our DKMS kernel driver package.

                      ...
                      Hi Bridgman. I'm replying to you, not the other guys complaining because you might be able to effect some change at AMD.

                      The install doc for ROCm is flat out confusing. If you just start reading from the top what you get from it is to install rocm-dmks. That's what you get when you start reading from the top. Later in the doc, in the third sentence within a paragraph that seems to be just some fluff, it says if you use post 4.18 kernel to not install rocm-dkms. IIRC it does not actually say install rocm-dev at that place in the doc, this info is given elsewhere. There is no real table of contents to the doc, and the doc's structure does not reflect the decision tree one should be using: what kernel have I got; what distro am I using.

                      Reasonable doc would have right up front the decision flow, built into to a table of contents.

                      You seem to be a sort of liason guy to the community? Maybe you could impress on the people managing ROCm development that spaghetti documentation is even worse than spaghetti code: it pisses off potential users and pushes them to the Other Companie.
                      Last edited by hoohoo; 23 February 2021, 01:48 PM.

                      Comment

                      Working...
                      X