How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tocsa
    Phoronix Member
    • Feb 2018
    • 57

    #31
    Originally posted by sophisticles View Post

    The Geforce 3 was released on February 27 2001 and was the first card to be fully programmable.
    Depends on how we interpret "full" programmability. It was operating on vertexes and could do texture operations. For many years that was the two main feature of all GPUs, strictly 3D, for games mostly and also CAD / simulation visualization.

    "Full" programmability got closer with the GPGPU movement, years later. At least that's my recollection. GPGPU, and for example CUDA still has many limitations.

    What do you do with warps, or others mention tasks which are not highly parallelizable? Lots of branch instructions?

    Comment

    • tocsa
      Phoronix Member
      • Feb 2018
      • 57

      #32
      Originally posted by habilain View Post
      However, do have a look at the Vortex GPGPU and the NEOX GA100. People are already using RISC-V cores as GPU shaders (OK, technically slightly extended RISC-V cores...) in a GPU.
      If it comes to many CPU cores as shaders and comprising a GPU there was Intel project Larrabee / MIC - later named Xeon Phi. They used earlier version P54C Pentium cores and connected 60+ of them to a wide ring superbus.

      One main advantage being they were full x86 compatible cores (with special avx 512 instructions). It needed a compiler, however it is much closer to an SMP machine with many cores than a CUDA to an SMP.

      I know RISC-V can be even smaller and nowadays could pile up a magnitude more of them (?), but I had to mention Larrabee.

      From a one mile view it'll be beneficial if any technology could lower the barrier to push computation to the GPUs. All high level languages - which today are still struggling with parallelization - can benefit. Maybe one day this will be just a special non-uniform architecture? Almost like big/little coming over from ARM to the x86 world, but it'll be big/little/worker?

      Note: I did not have time to watch the video yet.
      ​​​​
      Last edited by tocsa; 16 December 2024, 03:10 AM.

      Comment

      • pong
        Senior Member
        • Oct 2022
        • 316

        #33
        So just like OpenACC, OpenMP, but with a python-flavored DSL as opposed to a C/C++ language coding UX.

        Ok, I guess. But obviously one already has the "ordinary" concurrency related support for various languages including python, go, rust, C++, C, whatever so to some extent one could look at following
        those patterns. Though even ignoring concurrency / accelerator aspects a lot of "high performance python hpc" stuff is very much about getting just ordinary math / data structures back into the domain of C/C++ code for higher performance -- numpy etc. so to some extents if one wants "concurrency / acceleration just like C/C++" and one wants "arrays and data structures just like C/C++" then just programming in C/C++
        with OpenMP / OpenACC are "already easy" solutions to concurrency / acceleration / offloading / SIMD / fast math & vector problems.



        Originally posted by sophisticles View Post
        This is a great idea, here's what I would like to see happen:

        They create a simple, Python style language, that has easy to learn and read syntax and is easily expandable via libraries.

        Then add a function decorator that allows a programmer to specify which parts of the code should be run where, for instance:

        @_CPU
        def function(ONE)

        @_GPU
        def function(TWO)


        @_CPU​_128_SIMD
        def function(Three)

        I guess what I am saying is write a compiler for Python that allows a programmer to specify on what hardware he wants that code portion to be executed.

        Comment

        • pong
          Senior Member
          • Oct 2022
          • 316

          #34
          Originally posted by NeoMorpheus View Post
          If this helps getting rid of the cuda monopoly and makes ROCm unnecessary then fine by me.

          Sadly, if this does eliminate the need of ROCm, the local haters will need something else to keep the hate going. Lol
          One may still use whatever vendor's GPU, driver, low level SW stack. But from an application
          programming perspective it is quite possible to design and develop HPC / compute applications which do not in their source codes depend on any particular GPU vendor's non-standard software development schemes.

          These will work with C/C++ standard code:

          Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ...




          Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: In many cases, the results of these ports are worth…


          And proposed C++ standard code:

          `std::execution`, the proposed C++ framework for asynchronous and parallel programming. - NVIDIA/stdexec


          Whereas these work with e.g. C/C++/fortran which is annotated (preprocessor style markup only) such that it can optionally be vectorized / accelerated / parallelized by a tool chain / target which offers whatever possibilities for such but it does not have to be able to / enabled to be parallelized and it's still C/C++ code.







          And then there are gpu / accelerator independent / neutral options like SYCL, OpenCL, Vulkan.



          Comment

          • zakhrov
            Phoronix Member
            • Aug 2015
            • 86

            #35
            If this can help my Rx 5600M GPU run Stable diffusion then I'm all for it. Until then, a hacked together pytorch on Rocm is the only thing that does it

            Comment

            • boboviz
              Phoronix Member
              • May 2017
              • 117

              #36
              Originally posted by NeoMorpheus View Post
              If this helps getting rid of the cuda monopoly and makes ROCm unnecessary then fine by me.

              Sadly, if this does eliminate the need of ROCm, the local haters will need something else to keep the hate going. Lol
              THIS is a BIG step: programming directly to GPU (if it is possible for most of the code) needs to re-think, literaly, the "way in which developers think".
              So Cuda and Rocm will remain for a long time

              Comment

              • boboviz
                Phoronix Member
                • May 2017
                • 117

                #37
                Originally posted by pong View Post
                These will work with C/C++ standard code:
                And proposed C++ standard code
                The same ISO C++ commitee are working, since the C++17, to facilitate the develop of code on eterogenous devices (like gpus).
                C++26 is the next step
                While memory safety takes a back seat, the new C++ standard focuses on AI and parallel processing features to help developers better handle GPU acceleration and machine learning tasks.


                Comment

                • pong
                  Senior Member
                  • Oct 2022
                  • 316

                  #38
                  Originally posted by boboviz View Post

                  THIS is a BIG step: programming directly to GPU (if it is possible for most of the code) needs to re-think, literaly, the "way in which developers think".
                  So Cuda and Rocm will remain for a long time
                  Well what you say is true in a very perverse bizarre-world sort of way (which is today's reality).

                  "Hmm computers are too slow because of bad CPU parallelism & bad RAM bandwidth performance!"
                  ...
                  "Hey, here's an idea, let's build a thing called 'GPU' which does fast parallel SIMD and has really fast wide RAM attached to it!"
                  ...
                  "Hey these GPU things are great, let's make them faster!"
                  ...
                  "Hey these new faster GPUs are REALLY REALLY way faster than the CPU and RAM using their thousands of CPUs and 10x faster RAM!"
                  ...
                  "Hey these new faster GPUs are kind of awkward to program, let's make that easier! Gosh, I sure wish we could program the parallel threads just like parallel threads on a real CPU! And I wish we could just use GPU RAM as easily in software as the RAM in the main computer!"
                  ...
                  "Hey these fast GPUs are great for almost everything we want to compute, not just graphics, anything that needs fast memory or fast processing or parallel processing!"
                  ..
                  "Hey what do we use these CPU and RAM things for again? Oh, right, we use it to run linux so we can load our GPU drivers onto the GPUs and run GPU monitoring utilities, and run file systems so we can pipe data back and forth to out GPU!"

                  ...or, you know, on the other hand we could just make CPUs that have a factor of N better vector / SIMD processing and
                  a factor of N better RAM BW and then we could just you know program those in C, C++, RUST, Go, .... and then our GPUs would be small and inexpensive again because they'd only be needed if you care about video codec HW, display interfaces, ray tracing, and uh... nothing much else.

                  The main "thing" you need whether for programming a GPU or CPU that is parallel is simply a framework to define the sharing of data or its communication between independent / distributed / asynchronous processors, and having the ability to designate / control what should be vectorized / SIMDed / distributed to other threads / processors.
                  Other than that you've got serial computing and as much concurrency as you can launch.

                  In the classic CS sense I think there's an article begging to be written: "GPUs considered harmful" (at least when abused while neglecting the actual "main system" CPU/RAM capabilities to similarly scale with "Moore's law" wrt. RAM, parallelism, threading.

                  Someone could just stick an ARM / RISCV SOC in / alongside a GPU and call it "a computer".

                  But strangely we're plugging these tiny slow motherboards onto the huge fast GPUs and calling the GPUs "peripherals".

                  Comment

                  • pong
                    Senior Member
                    • Oct 2022
                    • 316

                    #39
                    Originally posted by boboviz View Post

                    The same ISO C++ commitee are working, since the C++17, to facilitate the develop of code on eterogenous devices (like gpus).
                    C++26 is the next step
                    While memory safety takes a back seat, the new C++ standard focuses on AI and parallel processing features to help developers better handle GPU acceleration and machine learning tasks.

                    Thanks, nice article. I look forward to the evolution and hope they land much of that ASAP.

                    Comment

                    • boboviz
                      Phoronix Member
                      • May 2017
                      • 117

                      #40
                      Originally posted by pong View Post
                      Well what you say is true in a very perverse bizarre-world sort of way (which is today's reality).
                      In the classic CS sense I think there's an article begging to be written: "GPUs considered harmful" (at least when abused while neglecting the actual "main system" CPU/RAM capabilities to similarly scale with "Moore's law" wrt. RAM, parallelism, threading.

                      Someone could just stick an ARM / RISCV SOC in / alongside a GPU and call it "a computer".
                      But strangely we're plugging these tiny slow motherboards onto the huge fast GPUs and calling the GPUs "peripherals".
                      You made me smile :-)
                      It's true, many people think that gpu is the "magic wand​" to solve all the problems about performances

                      If we are speaking about HPC/Science/Datacenter word, gpus are one of the solutions
                      But, in the day-by-day life, a lot of programs are not able to use multi-core cpu correctly, imagine the use of gpu

                      Comment

                      Working...
                      X