Originally posted by oleid
View Post
All those extra dialects are BAD, BAD news. It's already really bad that they all have their own forks of LLVM with rando modifications to try to make it work better with GPU code. That shit is impossible to upstream and it makes it really difficult (if not impossible) for them to move to new versions of LLVM. Further fragmentation with MLIR is not going to help the situation.
AMD is not a shining example of good driver development. Everyone on windows knows that already. AMD's driver and it's dumb locking issues are the reason there's no sane way to teardown a DRM-sched. AMD could pull their head out of their ass, drop their LLVM-based driver that is buggy garbage anyway, and move people to the parallel community drivers for their cards that actually work and perform well, but no, they're letting VALVe do all the work. It was because AMD's LLVM-based compiler is so terrible that VALVe funded the development of the ACO compiler for AMD cards. And then there's the absolute clusterfuck that is their "maybe it works maybe it doesn't" pretend-support for most of their cards, which is the main reason AI workloads are staying the fuck away from wasting time with AMD. Look on forums and you'll see an endless river of "I spent three days trying to get ROCm to work on my last-gen AMD card and it keeps crashing in a weird way when I make small changes" or "I used to be able to run my workload on card X with ROCm Y, but small update to ROCM Y.1 broke everything so I guess I'll just use the old version forever."
AMD is not the adult in the room when it comes to GPU drivers. At this point the only way their cards will ever get traction is if AMD get the hell out of the way and just leave the driver development to other people. That's how it played out with gaming on linux, that's how it's going to play out with AI.
Comment