The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • phoronix
    Administrator
    • Jan 2007
    • 67091

    The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

    Phoronix: The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

    Earlier this month at the virtual FOSDEM 2021 conference was an interesting presentation on how European developers are preparing for AMD-powered supercomputers and beginning to figure out the best approaches for converting existing NVIDIA CUDA GPU code to run on Radeon GPUs as well as whether writing new GPU-focused code with OpenMP device offload is worthwhile...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
  • oleid
    Senior Member
    • Sep 2007
    • 2469

    #2
    Any idea how this 2% overhead was estimated?

    Was it a hip-ified codebase compiled for CUDA? If so, I'm a little disappointed, since I've expected it to be only macro Magick and compile down to CUDA.

    OR did they compare hip-ified code vs code written natively for AMD GPUs? If so, I would be interested in what they used for native programming those GPUs.

    Comment

    • vegabook
      Senior Member
      • Nov 2015
      • 371

      #3
      All good for HPC guys with all the devops resources they have to get ROCm working, but for a simple Joe like me the fact that the latest and much vaunted Rocm 4 doesn't work "out of the box" on Kernel 5.8 (Ubuntu 20.04 LTS) is just ridiculous. AMD expects users to downgrade their kernel in order for the standard install not to fail miserably, on what is probably the most widely installed Linux distro by far. If it doesn't work yet, why release it? I appreciate that AMD is trying hard here on their compute stack but this is the kind of frustrating shoot-both-feet-twice mistake that is all too common from AMD, and that just gives CUDA a gale-force tailwind.
      Last edited by vegabook; 21 February 2021, 03:08 PM.

      Comment

      • gmarkom
        Junior Member
        • Feb 2021
        • 1

        #4
        Originally posted by oleid View Post
        Any idea how this 2% overhead was estimated?

        Was it a hip-ified codebase compiled for CUDA? If so, I'm a little disappointed, since I've expected it to be only macro Magick and compile down to CUDA.

        OR did they compare hip-ified code vs code written natively for AMD GPUs? If so, I would be interested in what they used for native programming those GPUs.
        Hi, all the tests took place on NVIDIA V100 GPUs as we do not have yet access on AMD hardware (it is mentioned in the slides in case you missed this information), so the original code is CUDA and then we hipify and all the CUDA calls have been HIP and any OpenMP offload remains as it is, just link. So, we use same hardware. Of course, we need to explore for many more applications, we just have few with this overhead. It does not mean that the AMD hardware has overhead of course.

        Comment

        • tildearrow
          Senior Member
          • Nov 2016
          • 7096

          #5
          There is a typo but there is a YouTube video so I can't report it but I remember it said "as as"...

          ​​​​​

          Comment

          • oleid
            Senior Member
            • Sep 2007
            • 2469

            #6
            Originally posted by vegabook View Post
            All good for HPC guys with all the devops resources they have to get ROCm working, but for a simple Joe like me the fact that the latest and much vaunted Rocm 4 doesn't work "out of the box" on Kernel 5.8 (Ubuntu 20.04 LTS) is just ridiculous. .
            you could upgrade as well. kernel 5.9 should be sufficient. at least I used to use that kernel with everything mainline , i.e dkms stuff not needed.

            Comment

            • phred14
              Senior Member
              • Dec 2008
              • 173

              #7
              Does anyone know of a Spice simulator that's already been converted this way? I've read about a few Cuda Spices, but not even that many of those.

              Comment

              • Danny3
                Senior Member
                • Apr 2012
                • 2308

                #8
                Originally posted by vegabook View Post
                All good for HPC guys with all the devops resources they have to get ROCm working, but for a simple Joe like me the fact that the latest and much vaunted Rocm 4 doesn't work "out of the box" on Kernel 5.8 (Ubuntu 20.04 LTS) is just ridiculous. AMD expects users to downgrade their kernel in order for the standard install not to fail miserably, on what is probably the most widely installed Linux distro by far. If it doesn't work yet, why release it? I appreciate that AMD is trying hard here on their compute stack but this is the kind of frustrating shoot-both-feet-twice mistake that is all too common from AMD, and that just gives CUDA a gale-force tailwind.
                True, this is what annoys me the most!
                Compared to CUDA, I had problems with everything, the distro, the kernel, the driver.
                AMD needs to fix this crap as soon as possible!
                I don't even know why people say this is open source software, because I never had so much compatibility problems with an open source software.

                Comment

                • vegabook
                  Senior Member
                  • Nov 2015
                  • 371

                  #9
                  Originally posted by Danny3 View Post

                  True, this is what annoys me the most!
                  Compared to CUDA, I had problems with everything, the distro, the kernel, the driver.
                  AMD needs to fix this crap as soon as possible!
                  I don't even know why people say this is open source software, because I never had so much compatibility problems with an open source software.
                  I'm going to speculate. AMD has tried to throw so much jelly at the wall with ROCm, that the stack is so complex trying to do so much (HiP, OpenCL 2.0, CUDA translation layer, ML libraries, OpenMP, yada yada yada), that they've set themselves an almost impossible task to keep this enormous tangled herd of behemoths up to date. This project feels directionless or worse - run by the marketing department and not the programmers. They're constantly in "oh sh_i_t we need this too ship it now 'cos Sales said so" mode instead of doing a few things really well. How about just doing wgpu and OpenCL perfectly, or maybe Vulkan Compute. Imagine the following conversation between AMD and Google, which I am 100% sure is happening:

                  AMD: "Please support ROCm"
                  Goog: "Sure we'd love to! We don't like Nvidia's monopoly either"
                  AMD: "We've just launched ROCm 4.0!"
                  Goog: "Cool!! [1 week later....] okay I've ported XLA to it but you're asking all my users to downgrade their kernel"
                  AMD: "oh...."
                  Goog: "F-off until you've fixed this"

                  Rinse and repeat until Google gets sick of you.

                  For avoidance of doubt: personally I really want AMD to succeed.
                  Last edited by vegabook; 21 February 2021, 08:01 PM.

                  Comment

                  • bridgman
                    AMD Linux
                    • Oct 2007
                    • 13183

                    #10
                    In case it helps, the "limited to an older kernel" issue only applies if you are using our DKMS kernel driver package.

                    The kernel source code goes upstream quite aggressively (with the exception of a couple of non-upstreamable bits like RDMA) and the binary packages are organized so that you can either install userspace only (with a newer kernel) or userspace and kernel drivers (with an enterprise distro's older kernel).

                    If you have a newer kernel that the DKMS package doesn't support just install the rocm-dev meta-package only, although going to the newest easily obtainable kernel is a good idea in that case.

                    https://rocmdocs.amd.com/en/latest/I...r-AMD-GPU.html

                    This seems to have disappeared from the top level install instructions again - I'll go find out what happened.

                    Another option is to build everything from source, but last time I looked it was still a bit clunky until the build frameworks get cleaned up and harmonized a bit more.

                    Most of our major customers build from source, by the way.

                    Radeon Open eCosystem (ROCm)
                    Minor nitpick - I think it's actually Radeon Open Compute platforM.
                    Last edited by bridgman; 21 February 2021, 08:30 PM.
                    Test signature

                    Comment

                    Working...
                    X