How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DiamondAngle
    Junior Member
    • Oct 2017
    • 46

    #21
    Originally posted by habilain View Post

    Probably not possible? ROCm doesn't officially support APUs, so I'm pretty sure ROCm doesn't support zero copy. Or if it does, that's weird.

    Even if it is possible, I'm not sure there'd be that much benefit. APUs aren't that powerful - and it would likely still need special code to get decent performance. For example, double precision floating points tend to be quite slow on a GPU.
    MI300A says hello

    Comment

    • habilain
      Junior Member
      • Feb 2016
      • 39

      #22
      Originally posted by DiamondAngle View Post

      MI300A says hello
      Gosh darn it I knew I was forgetting something! In that case maybe it does make sense?

      I still think that knowing when to switch automatically is a very non-trivial problem, but now I could see switching based on, say, a list of functions that get delegated to the GPU.

      Comment

      • sophisticles
        Senior Member
        • Dec 2015
        • 2619

        #23
        Originally posted by ojsl1 View Post
        Twenty years ago AMD wasnt there pushing for open-source, that's why. Stereotypical corpo thinks keeping their hardware inaccessible makes it look more magical than it actually is.
        Allow me to crack an egg of knowledge on your head:

        On January 22 1999 NVIDIA had its IPO and closed at split corrected 0.041 cents.

        The Geforce 256 was the first card to feature hardware T&L and MPEG-2 motion compensation and was released on Oct 11 1999.

        The Geforce 2 was released in mid May 2000 and by the end of the year the entire Geforce 2 lineup had been released.

        On January 22 2001 NVIDIA's stock was at a split corrected .19 cents a share.

        The Geforce 3 was released on February 27 2001 and was the first card to be fully programmable.

        On January 22 2001 NVIDIA's stock was at a split corrected .48 cents a share.

        CUDA was initially introduced on February 15 2007, on that day NVIDIA's stock closed at a split corrected .52 cents a share.

        Today, NVIDIA's stock closed at $139.31.

        If you had invested $1000 in NVIDIA's IPO by close of day, today you would have:

        $1000 / 0.041 = 24390 shares.

        24390 shares * $139.31 = $3,397,770.90

        Let's see AMD:

        On December 13 2004 AMD's stock closed at $22.11.

        By February 2006 AMD's stock hovered around the $40 range.

        On October 25 2006 the acquisition was complete and AMD's stock closed at $20.81.

        It's difficult to say when AMD first embraced open source, but they announced GPUOpen on December 15 2015.

        On that day AMD stock closed at $2.36 but their stock had been declining for years.

        It dipped to under $2 a share and slowly started climbing over the years until settling in between the $50 and $100 price points until November of 2023 when it went on a run before topping out at $211.38 on March 7 2024 and declining steadily before closing at $130.15

        The moral of the story is that open source does nothing to help a company's bottom line and in fact hurts it in many cases.

        Comment

        • Soul_keeper
          Senior Member
          • Aug 2011
          • 265

          #24
          Their stock prices have very little to do with open source.
          Also you left out AMD's aquisition of ATI which nearly bankrupted them.

          Comment

          • toves
            Senior Member
            • Sep 2021
            • 133

            #25
            Originally posted by schmidtbag View Post
            Kinda makes me wonder why this wasn't done nearly 20 years ago? Even though there are many good reasons for the CPU to do the work, having the option for the GPU to do "raw" C code would still make sense and perhaps would have seen CUDA with a lot less adoption.
            I think some very early GPU cards (very definitely pre CUDA) had a modified gcc compiler that could generate code for the GPU that could be loaded into the hardware and there was some method for code running on the CPU to talk to that running on the GPU.

            I imagine the process was very manual and clunky which presumably led to the development of the likes of CUDA.

            I can imagine GPUs gaining more CPU type features in response to their increasingly non graphics workloads. Eventually GPUs might become fairly general purpose (risc?) MIMD processors with the old amd64/x86-64 processors relegated to secondary roles of peripheral I/O co-processors.

            Comment

            • habilain
              Junior Member
              • Feb 2016
              • 39

              #26
              Originally posted by toves View Post
              I can imagine GPUs gaining more CPU type features in response to their increasingly non graphics workloads. Eventually GPUs might become fairly general purpose (risc?) MIMD processors with the old amd64/x86-64 processors relegated to secondary roles of peripheral I/O co-processors.
              That seems unlikely - most workloads do not scale to the 1000's of shader units that a GPU has. Worse, sometime that sort of parallelization can hurt performance.

              However, do have a look at the Vortex GPGPU and the NEOX GA100. People are already using RISC-V cores as GPU shaders (OK, technically slightly extended RISC-V cores...) in a GPU.

              Comment

              • bridgman
                AMD Linux
                • Oct 2007
                • 13188

                #27
                Originally posted by schmidtbag View Post
                Kinda makes me wonder why this wasn't done nearly 20 years ago? Even though there are many good reasons for the CPU to do the work, having the option for the GPU to do "raw" C code would still make sense and perhaps would have seen CUDA with a lot less adoption.
                Let's split it into two questions:

                #1 - why didn't we have general purpose C code running on GPUs 20 years ago ?

                Before OpenCL existed we had CTM (2006, I think) which was rebranded as the Stream SDK. They supported Brook, which was C with enhancements to support data parallel computing:



                Once OpenCL started to catch on we switched our compute focus to OpenCL, also C-based. In hindsight we probably would have been better off if we had stayed with the Stream SDK and something like Brook as the primary focus, which is basically what NVidia did with CUDA, rather than losing ~6 years focusing on OpenCL and going through two major transitions (Stream/Brook -> OpenCL -> ROCm). ​

                Whether we used C or OpenCL one of the challenges at the time was that we moved to unified shaders (R600) and scalar cores (GCN) in two steps while NVidia did both in a single step with G80. Between 2006 and 2011 that meant we had to map scalar compute code onto VLIW hardware, which was hard to do efficiently (Cayman was VLIW4 rather than VLIW5 which helped a bit).

                The other potential issue was hardware support for "persistent threads", ie shader programs which ran for an unbounded time and co-existed / communicated with / launched other shorter-lived threads. I don't think there was a specific point where this support appeared but rather it was incrementally improved over a few generations of hardware maybe 12-15 yrs ago. Believe this applies to both AMD and NVidia.

                #2 - why wasn't LLVM used for compiling general purpose code to GPU 20 years ago ?

                We have actually been using LLVM in compute for almost that long - initially with the AMDIL back end for OpenCL, then Tom started publicly working on the R600 back end for direct code generation in 2011, so at least 13 years ago.

                We couldn't have started much earlier since Clang hadn't been released yet (1.0 in 2009) and gcc didn't seem like a good candidate for a shader compiler. That's how we ended up with a proprietary shader compiler in the first place. We started using LLVM for compute via a proprietary C++ front end that we licensed in, but that was not our code and could not be open sourced.

                Tom's work led to using Clang and generating code directly, which allowed upstreaming for the first time. His experimental work was on the R600 family hardware but GCN was the first hardware that was a good fit with a general purpose compiler. Our open source driver support for GCN and up (~2013) used Clang/LLVM/native/upstream from the start. It also grew into the compiler we use for ROCm today.
                Last edited by bridgman; 12 December 2024, 10:22 PM.
                Test signature

                Comment

                • geekinasuit
                  Junior Member
                  • Jan 2024
                  • 3

                  #28
                  Originally posted by habilain View Post
                  That's mainly because the GPU and CPU do not share the same memory
                  The MI300A shares memory with the GPU and CPU. IMO, the GPU concept is a dead end, hybrid CPU+GPU make a lot more sense, but the software ecosystems out there have been built using old school methods, where there are either only CPU's, or only GPU's, and when there's both, they are fully independent of each other.
                  Last edited by geekinasuit; 13 December 2024, 03:20 AM.

                  Comment

                  • RonanKeryell
                    Junior Member
                    • Dec 2014
                    • 9

                    #29
                    Originally posted by andreano View Post
                    This appears to be not just about C or C++ or GPUs but Rust and FPGAs as well:

                    I saw a demo of Rust on an AMD Xilinx FPGA. Yes, the Rust code was synthesized to FPGA logic. The guy told me this:
                    Unlike Intel, which are reimplementing C++ in the form of SYCL with OneApi, which targets their GPU and FPGA products, AMD is doing the same using LLVM IR as input.
                    This sounds like history rewriting to me.
                    SYCL is not a reimplementation of C++ but a standard from Khronos Group based on pure C++.
                    The first SYCL demo was done in 2014 at SuperComputing conference on AMD booth with an AMD FirePro S9150 GPU​, more than 10 years ago!
                    There are some pictures :-) https://www.linkedin.com/posts/ronan...168736768-qEt9
                    We had at Xilinx a small open-source prototype targeting Xilinx FPGA with SYCL at the end of 2017, long before Intel adopted SYCL.
                    This is why I hacked this LLVM IR input for Xilinx (now AMD) FPGA actually. ;-)

                    Comment

                    • JPFSanders
                      Senior Member
                      • May 2016
                      • 426

                      #30
                      Originally posted by sophisticles View Post


                      Bla bla bla bla, factually correct but meaningless.
                      According to your logic AMD were conquering the market fighting hand to hand with Nvidia when they were producing their catalyst closed source drivers.

                      One should never confuse success with either being clever nor doing the correct thing.

                      The NVidia software stack has always being superior to ATI/AMDs offering, sometimes by abusing tricks, other times such as CUDA because they just did it better.

                      The AMD computing stack is competitive on some cards with some software combinations only, and this is a huge thing that is hurting AMD who is fixated with charging as much as Nvidia for data-centre stuff while being perceived (justifiably) a worse proposition, the simple fact that CUDA works and works well out of the box with pretty much any of Nvidia's cards seems to elude AMD year after year for some reason.

                      Having said that, me personally will never buy an Nvidia card for the foreseeable future, for me is not worth it, I might evaluate it again once the open source driver is competitive for raster.

                      My point is if Nvidia had an open source "kernel driver" (The part that deals with the hardware) but had follow the same policies with the rest of their software stack being closed they would have had the same success because AMD would still have been equally short sighted and would have made the same mistakes with compute.

                      Comment

                      Working...
                      X