Announcement

Collapse
No announcement yet.

AMD Job Posting Confirms More Details Around Their AI GPU Compute Stack Plans

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by oleid View Post

    I'm not so sure there. MLIR can quite successfully being used for Nvidia hardware.

    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.


    And people are busy implementing the remaining bits of PTX in MLIR's GPU dialect.
    The development of all GPU manufacturers happens in llvm. They even have their own dialects in mlir to support specific features. It would be stupid if AMD would not support MLIR and by this llvm.

    But I agree, it is a lot more work for distributions to package and use llvm. At least Gentoo supports multiple parallel llvm installations.
    It happens in LLVM but not for good reasons. Most of them are historical "we need OpenCL and we don't want to build it" reasons or "we would like to somehow use the same driver for windows and linux, and we can't ship the linux driver on windows" reasons.

    All those extra dialects are BAD, BAD news. It's already really bad that they all have their own forks of LLVM with rando modifications to try to make it work better with GPU code. That shit is impossible to upstream and it makes it really difficult (if not impossible) for them to move to new versions of LLVM. Further fragmentation with MLIR is not going to help the situation.

    AMD is not a shining example of good driver development. Everyone on windows knows that already. AMD's driver and it's dumb locking issues are the reason there's no sane way to teardown a DRM-sched. AMD could pull their head out of their ass, drop their LLVM-based driver that is buggy garbage anyway, and move people to the parallel community drivers for their cards that actually work and perform well, but no, they're letting VALVe do all the work. It was because AMD's LLVM-based compiler is so terrible that VALVe funded the development of the ACO compiler for AMD cards. And then there's the absolute clusterfuck that is their "maybe it works maybe it doesn't" pretend-support for most of their cards, which is the main reason AI workloads are staying the fuck away from wasting time with AMD. Look on forums and you'll see an endless river of "I spent three days trying to get ROCm to work on my last-gen AMD card and it keeps crashing in a weird way when I make small changes" or "I used to be able to run my workload on card X with ROCm Y, but small update to ROCM Y.1 broke everything so I guess I'll just use the old version forever."

    AMD is not the adult in the room when it comes to GPU drivers. At this point the only way their cards will ever get traction is if AMD get the hell out of the way and just leave the driver development to other people. That's how it played out with gaming on linux, that's how it's going to play out with AI.

    Comment


    • #12
      Originally posted by Developer12 View Post

      It happens in LLVM but not for good reasons. Most of them are historical "we need OpenCL and we don't want to build it" reasons or "we would like to somehow use the same driver for windows and linux, and we can't ship the linux driver on windows" reasons.

      All those extra dialects are BAD, BAD news. It's already really bad that they all have their own forks of LLVM with rando modifications to try to make it work better with GPU code. That shit is impossible to upstream and it makes it really difficult (if not impossible) for them to move to new versions of LLVM. Further fragmentation with MLIR is not going to help the situation.

      AMD is not a shining example of good driver development. Everyone on windows knows that already. AMD's driver and it's dumb locking issues are the reason there's no sane way to teardown a DRM-sched. AMD could pull their head out of their ass, drop their LLVM-based driver that is buggy garbage anyway, and move people to the parallel community drivers for their cards that actually work and perform well, but no, they're letting VALVe do all the work. It was because AMD's LLVM-based compiler is so terrible that VALVe funded the development of the ACO compiler for AMD cards. And then there's the absolute clusterfuck that is their "maybe it works maybe it doesn't" pretend-support for most of their cards, which is the main reason AI workloads are staying the fuck away from wasting time with AMD. Look on forums and you'll see an endless river of "I spent three days trying to get ROCm to work on my last-gen AMD card and it keeps crashing in a weird way when I make small changes" or "I used to be able to run my workload on card X with ROCm Y, but small update to ROCM Y.1 broke everything so I guess I'll just use the old version forever."

      AMD is not the adult in the room when it comes to GPU drivers. At this point the only way their cards will ever get traction is if AMD get the hell out of the way and just leave the driver development to other people. That's how it played out with gaming on linux, that's how it's going to play out with AI.
      Agreed. AMD is so behind in that field - their focus was on gaming and only recently, have they invested any $/time in AI/Compute. My next gpu will be Nvidia - I have no choice. Practically everyone I've chatted with have said - to 'buy nvidia' - for Blender and Stable Diffusion (video editing too - recommendations are for a nvidia gpu). So....

      Comment

      Working...
      X