How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • phoronix
    Administrator
    • Jan 2007
    • 67114

    How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs

    Phoronix: How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs

    Back at the 2024 LLVM Developers' Meeting was an interesting presentation by AMD engineer Joseph Huber for how they have been exploring running common, standard C/C++ code directly on GPUs without having to be adapted for any GPU language / programming dialects or other adaptations...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
  • aerospace
    Phoronix Member
    • Apr 2024
    • 63

    #2
    This is really nice.

    Comment

    • Setif
      Senior Member
      • Feb 2016
      • 301

      #3
      Will the code run on CPU and offload loops and array operations to GPUs automatically by the compiler, or run entirely on GPU. If the latter, Aren't some operations not possible or slow to run on GPUs?

      Comment

      • NeoMorpheus
        Senior Member
        • Aug 2022
        • 590

        #4
        If this helps getting rid of the cuda monopoly and makes ROCm unnecessary then fine by me.

        Sadly, if this does eliminate the need of ROCm, the local haters will need something else to keep the hate going. Lol

        Comment

        • chuckula
          Senior Member
          • Dec 2011
          • 842

          #5
          I'm in favor of anything that makes GPUs less of a black box hidden behind millions of lines of driver code & convoluted APIs. Of course optimizing this to target the strengths of the GPU is the next step l, but this is very cool.

          Comment

          • habilain
            Junior Member
            • Feb 2016
            • 37

            #6
            Originally posted by Setif View Post
            Will the code run on CPU and offload loops and array operations to GPUs automatically by the compiler, or run entirely on GPU. If the latter, Aren't some operations not possible or slow to run on GPUs?
            GPUs are turing complete these days, so "slow" rather than "not possible". I think at the moment it's mainly "this code can be run on the GPU" rather than automatic offloading - which, while it would be nice for the lazy programmer, would probably not be possible on a standard GPU architecture. That's mainly because the GPU and CPU do not share the same memory, so offloading a loop from CPU to GPU has a high fixed cost and I really don't see a good way of being able to automatically determine when it's worth doing.

            Originally posted by NeoMorpheus View Post
            If this helps getting rid of the cuda monopoly and makes ROCm unnecessary then fine by me.

            Sadly, if this does eliminate the need of ROCm, the local haters will need something else to keep the hate going. Lol
            Do a little more research - LLVM is targetting ROCm in this case.

            Comment

            • Nille
              Senior Member
              • Jul 2008
              • 1305

              #7
              Originally posted by Setif View Post
              run entirely on GPU. If the latter, Aren't some operations not possible or slow to run on GPUs?
              As far as i understand all runs on the GPU without concern of it fast or slow on it.

              Comment

              • DiamondAngle
                Junior Member
                • Oct 2017
                • 44

                #8
                Originally posted by NeoMorpheus View Post
                If this helps getting rid of the cuda monopoly and makes ROCm unnecessary then fine by me.

                Sadly, if this does eliminate the need of ROCm, the local haters will need something else to keep the hate going. Lol
                i mean this IS rocm. At the base level this is the rocm compiler and the rocm runtime just skipping the single source wrangling and without the syntactic sugar or the build system that supports it (hipcc).

                95% of "ROCm" is a huge amount of libaries that provide domain specific kernels that perform well, because getting good performance across various gpus is _hard_ so everyone relies on the vendors to provide a tonne of premade kernels (often written in asm) that they then just use.
                Even if you start adding syntactic sugar to this that makes it sane to use, preformance wise, you would have to port all of that from rocm/hip to this, a huge undertakeing with no real upside.
                This is a fun and interesting toy but its really hard to see how it could ever be mutch more than that.
                Last edited by DiamondAngle; 11 December 2024, 09:14 AM.

                Comment

                • andreano
                  Senior Member
                  • Jun 2012
                  • 594

                  #9
                  This appears to be not just about C or C++ or GPUs but Rust and FPGAs as well:

                  I saw a demo of Rust on an AMD Xilinx FPGA. Yes, the Rust code was synthesized to FPGA logic. The guy told me this:
                  Unlike Intel, which are reimplementing C++ in the form of SYCL with OneApi, which targets their GPU and FPGA products, AMD is doing the same using LLVM IR as input.
                  Last edited by andreano; 11 December 2024, 09:28 AM.

                  Comment

                  • ddriver
                    Senior Member
                    • Jan 2014
                    • 711

                    #10
                    Bad idea, you don't know which feature works, or how costly the feature implementation.

                    At the risk of sounding old-fashioned, I say compute kernels should be PODs, primitives and low level arithmetic only. There's no need for any more abstraction on the compute side of things.

                    Comment

                    Working...
                    X