Announcement

Collapse
No announcement yet.

AMD Job Posting Confirms More Details Around Their AI GPU Compute Stack Plans

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Job Posting Confirms More Details Around Their AI GPU Compute Stack Plans

    Phoronix: AMD Job Posting Confirms More Details Around Their AI GPU Compute Stack Plans

    A Friday evening job posting has confirmed and reinforced details around their future AI GPU compute stack, presumably what's been referred to as the Unified AI Software Stack...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    we aim to provide broad and performant GPU coverag
    When it comes to GPU compute, "broad coverage" is the exact opposite of what AMD has done so far.

    Comment


    • #3
      Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post

      When it comes to GPU compute, "broad coverage" is the exact opposite of what AMD has done so far.
      nah, they just didnĀ“t enable the consumer cards by default in ROCm.. They where still supported and worked perfectly fine, GCN / Polaris Vega was supported anyway, as that was still the time, where the accelerators sold for ML / compute where the same architecture than the consumer gaming cards. The Vega 64 Consumer card even had HBM2 memory

      And RDNA1 was/is supported as well. You can run PyTorch on a 5700XT today! ML (MIOpen) support for RDNA took a littlebit, but it works fine since some years now!

      Comment


      • #4
        Is the GPU support going to continue to only cover a handful of GPUS, while leaving everyone else to wonder which others are buggy, broken, and a total waste of time?

        Because I'm all out of patience. Even if SOME of amd's GPUs work, it's not worth my time and risk to try to find which ones those are, especially when the next update could break them. AMD makes good hardware but their software stack completely negates any positivity with a steaming pile of uncertainty, labour, and risk.

        Comment


        • #5
          Oh, and another thing: the sheer fact that AMD is going to LLVM for this tells you they've LEARNED ABSOLUTELY NOTHING.

          There are broadly two ways to build a GPU driver: get your shader compiler from LLVM, or use NIR. AMD chose the former while everyone else chose the latter. Yes, it means that AMD got OpenCL for free. But, LLVM is both a terrible compiler for GPUs and an absolute boat-anchor when it comes to release schedules and packaging.

          With rusticl shipping OpenCL support there's even less reason to choose LLVM than there ever was, and yet AMD are doubling down on this stupid decision.

          Comment


          • #6
            Originally posted by Developer12 View Post

            There are broadly two ways to build a GPU driver: get your shader compiler from LLVM, or use NIR. AMD chose the former while everyone else chose the latter. Yes, it means that AMD got OpenCL for free. But, LLVM is both a terrible compiler for GPUs and an absolute boat-anchor when it comes to release schedules and packaging.
            That was discussed here before, yet, I couldn't find it anymore. My takeaway was: NIR is better suited for graphics driver needs, however, llvm is better for compute.

            Edit: also, there is now MLIR and apparently the projects mentioned in the article make heavy use of it. That didn't exist back then when NIR was started.

            Edit2:
            I think the main problem with using llvm (the project) for GPUs is llvm (the intermediate representation). MLIR is the solution here, one of the reasons being higher level than llvm IR.
            Last edited by oleid; 14 October 2024, 01:37 AM.

            Comment


            • #7
              Originally posted by Spacefish View Post

              nah, they just didnĀ“t enable the consumer cards by default in ROCm.. They where still supported and worked perfectly fine, GCN / Polaris Vega was supported anyway, as that was still the time, where the accelerators sold for ML / compute where the same architecture than the consumer gaming cards. The Vega 64 Consumer card even had HBM2 memory

              And RDNA1 was/is supported as well. You can run PyTorch on a 5700XT today! ML (MIOpen) support for RDNA took a littlebit, but it works fine since some years now!
              Yes its improving. I was complaining about this since years but I do see some improvement. Which is great.

              Comment


              • #8
                *yawn* Let me know when I can feel confident that Easy Diffusion, automatic1111, Kohya_ss, and any other Stable Diffusion-related container-esque distro that comes to my attention can be trusted to Just Work with their "clone our repo and run this shell script" install flow.

                The whole reason I upgraded from an Athlon II X2 270 (a 2011 CPU) with a brand new Cyber Monday'd RTX 3060 12GiB to a Ryzen 5 7600 was because it required too much expertise to swap out the bundled copy of TensorFlow for one built without a requirement for AVX. I don't want to risk that again, but with "one built with support for not just AMD, but my AMD card" substituted in that sentence.

                Comment


                • #9
                  Originally posted by oleid View Post

                  That was discussed here before, yet, I couldn't find it anymore. My takeaway was: NIR is better suited for graphics driver needs, however, llvm is better for compute.

                  Edit: also, there is now MLIR and apparently the projects mentioned in the article make heavy use of it. That didn't exist back then when NIR was started.

                  Edit2:
                  I think the main problem with using llvm (the project) for GPUs is llvm (the intermediate representation). MLIR is the solution here, one of the reasons being higher level than llvm IR.
                  LLVM has been "good for compute" only because you historically for OpenCL for free (originally for targeting CPUs). With rusticl this advantage evaporates.

                  MLIR might alleviate some of the issues, but the fundamental truth is that LLVM just was never intended or architected to compile code for GPUs. MESA with NIR will *always* outperform it.

                  MLIR also does *nothing* to alleviate the massive, massive headache that is packaging LLVM to work with MESA. They operate on totally different release schedules, LLVM doesn't play nice when loading multiple versions of itself, there's gobs and gobs of workaround code involved and it takes FOREVER for users to actually get access to new features and fixes, let alone drop old buggy versions of LLVM. And then there's just that LLVM is goddamn impossible to package cleanly so the burden makes distros hate it.

                  Comment


                  • #10
                    Originally posted by Developer12 View Post

                    MLIR might alleviate some of the issues, but the fundamental truth is that LLVM just was never intended or architected to compile code for GPUs. MESA with NIR will *always* outperform it.
                    I'm not so sure there. MLIR can quite successfully being used for Nvidia hardware.

                    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.


                    And people are busy implementing the remaining bits of PTX in MLIR's GPU dialect.
                    The development of all GPU manufacturers happens in llvm. They even have their own dialects in mlir to support specific features. It would be stupid if AMD would not support MLIR and by this llvm.

                    But I agree, it is a lot more work for distributions to package and use llvm. At least Gentoo supports multiple parallel llvm installations.

                    Comment

                    Working...
                    X