Announcement

Collapse
No announcement yet.

AMD XDNA Linux Driver Updated As It Nears The Upstream Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • zamroni111
    replied
    Originally posted by scottishduck View Post
    It’s almost like the Ryzen AI thing was pure marketing buzz and no one actually goes to AMD for AI.
    instead of terrible xdna, amd should just use cdna cores without fp64 and fp32 circuit as npu

    Leave a comment:


  • chithanh
    replied
    Originally posted by cgmb View Post
    I'm not sure what you're referring to. The drivers for AMD's enterprise GPUs are open source and available as both an out-of-tree kernel module and upstreamed into linux itself.
    I am referring to AMD enterprise hardware. One prime example is the Google Stadia Workstation Development Node, that was produced by Lenovo for a while before Stadia shut down. It is loaded front to back with proprietary code from AMD. If you manage to snag one off eBay, be sure to preserve all software and drivers that are on it, because if you wipe it you will never get that back.

    Also MxGPU is a sore point. You can download GIM drivers which support up to the FirePro S7150, but anything newer like the Radeon V340 or V620 is unsupported by them and only by non-public drivers. Questions about that on AMD community forums go unanswered.

    Originally posted by cgmb View Post
    ​My understanding is that the place to report kernel bugs is https://gitlab.freedesktop.org/drm/amd/-/issues. Although, it is true that it may still be difficult to get attention on your issue.
    There are multiple places to report Linux driver bugs. One is on freedesktop.org, others are against GitHub ROCm, for example. But your software doesn't work you need to have a precise bug report or it will get ignored until your hardware ages out of ROCm support and then the bug will be closed. And the AMD community forums are unhelpful in isolating the issue because nobody there is able and willing.

    Leave a comment:


  • cgmb
    replied
    Originally posted by Developer12 View Post
    It's a total gamble if any given AMD card will run ROCm. So far AMD seem to be intentionally ambiguous to hide the fact that the actual, guaranteed-to-work pool of cards is vanishingly small. (while a slightly larger number of cards are unmaintained and "might" work or might crash or might start crashing with the next update) The AMD driver support is so unmaintained that GCC compiler devs are dropping card support because the drivers are too broken to test if the compilers work.
    FWIW, the Debian ROCm Team has focused on compatibility and stability. The system packages provided by Ubuntu 24.04 and the upcoming Debian 13 have a much wider set of supported GPUs than the AMD official packages. Just about every discrete GPU since Vega should work (and even Polaris GPUs work for some things). Just take a look at their continuous integration system. It's a bit of a work-in-progress, but you can see it working well for libraries like rocrand: https://ci.rocm.debian.net/packages/r/rocrand/​

    Originally posted by chithanh View Post
    You can buy used NVidia enterprise hardware, download drivers from NVidia, and run it fine in your lab. Driver support goes back many generations.
    You cannot do the same with used AMD enterprise hardware and expect it to work, as their drivers for enterprise hardware are often proprietary and locked behind service contracts. Those parts which are supported by ROCm, be happy if you can use it for a year before AMD drops support.
    I'm not sure what you're referring to. The drivers for AMD's enterprise GPUs are open source and available as both an out-of-tree kernel module and upstreamed into linux itself. I've purchased plenty of used AMD Instinct GPUs off eBay and they've all worked fine on Debian Bookworm with the backports kernel and firmware (although I do need to use Debian Unstable in a docker container for the user space as there are not many ROCm packages in Bookworm).​ With that said, I've only ever tried this with MI6, MI8, MI25, MI50, MI60, MI100, and MI210 cards. And a couple of the used MI25s I bought had weird VBIOS loaded on them that I had to flash back to normal, but I imagine that's a risk for any used hardware.

    Originally posted by chithanh View Post
    3. It is not possible for the developer community to reach anyone inside AMD who has the ability to reproduce and fix issues in the drivers. You can complain on the AMD community forums (a horribly slow and bloated piece of web service btw.) all day long, but the only thing that the moderators can do there is placate users. Nobody there from AMD with any technical capacity (nor the hardware) to help users isolate issues and raise them internally at AMD.
    My understanding is that the place to report kernel bugs is https://gitlab.freedesktop.org/drm/amd/-/issues. Although, it is true that it may still be difficult to get attention on your issue.

    Leave a comment:


  • chithanh
    replied
    Originally posted by Adarion View Post
    So many things could benefit from it, some LibO stuff, darktable, blender, all the real compute stuff that was mostly written with GPUs as processors in mind, all that so called A"I" stuff these days. But it needs compute support and not just in 4 to 5 totally unaffordable CDNA models and 3 ultra-high-end ones of the normal "gaming" series.
    I think you are close but not on point.

    The main problem that AMD is behind NVidia on Blender etc. support is because the developer community is not dogfooding on AMD hardware. This is due to multiple reasons:

    1. There is practically no reason to choose AMD hardware over NVidia (even ignoring the driver situation) since AMD started to emulate NVidia in fusing off enterprise features in their consumer hardware. Level1Techs and others have been campaigning for years to get at least some reduced form of SR-IOV (MxGPU) on consumer hardware, but to no avail. SR-IOV was even removed from professional/workstation products and nowadays is only in datacenter.

    2. AMD has absolute disdain and least regard for home labs, presumably because their corporate bean counters do not see any revenue from those. They totally ignore that mindshare among the developer community depends on home labs to a large extent.
    You can buy used NVidia enterprise hardware, download drivers from NVidia, and run it fine in your lab. Driver support goes back many generations.
    You cannot do the same with used AMD enterprise hardware and expect it to work, as their drivers for enterprise hardware are often proprietary and locked behind service contracts. Those parts which are supported by ROCm, be happy if you can use it for a year before AMD drops support.
    And then you see on the enterprise CPU side that AMD introduces PSB mobo vendor locks , with no possibility to instead reset the crypto keys or any other recourse, severely impacting the market for second-hand EPYC CPUs. Again, no consideration at all given to home labs.

    3. It is not possible for the developer community to reach anyone inside AMD who has the ability to reproduce and fix issues in the drivers. You can complain on the AMD community forums (a horribly slow and bloated piece of web service btw.) all day long, but the only thing that the moderators can do there is placate users. Nobody there from AMD with any technical capacity (nor the hardware) to help users isolate issues and raise them internally at AMD.

    Until these things change, AMD will always remain an also-ran in software support.

    Leave a comment:


  • Adarion
    replied
    But nvidia behing ahead should still not influence your driver quality and HW support. Let nv be whereever they want, AMD is to focus on their own HW and make sure this just works. There are various regressions in amdgpu that I noticed recently and compute support is a sheer pain. They COULD have been so much better. It reminds me of VIA even, fancy ideas, interesting concepts, but failing utterly to deliver due to bad driver support.
    For the record: No, I am not saying AMD's driver support is bad in itself, it has come a long way and we do have a free as in freedom driver stack, which is great and 2 bazillion percent more than VIA ever had. I followed it from the earliest days (radeonHD, when libv was still involved) and it has grown and matured. However, it starts to show regressions and compute is a painful field. And here it would be so good, the potential of great acceleration for some tasks, has been there since the E-350. So many things could benefit from it, some LibO stuff, darktable, blender, all the real compute stuff that was mostly written with GPUs as processors in mind, all that so called A"I" stuff these days. But it needs compute support and not just in 4 to 5 totally unaffordable CDNA models and 3 ultra-high-end ones of the normal "gaming" series. Nah, this has to be with every APU, dGPU and whatnot. (Yeah, one won't reach the performance of a CDNA dedicated compute card, but this is for everyday people with everyday tasks, as mentioned above; people who do dedicated compute on large scale will then likely also afford the big monsters.)

    Leave a comment:


  • ahrs
    replied
    Originally posted by Developer12 View Post
    AMD *could* have good AI products. They routinely have decent hardware at a decent price, but their software and driver stack is absolute crap.

    It's a total gamble if any given AMD card will run ROCm. So far AMD seem to be intentionally ambiguous to hide the fact that the actual, guaranteed-to-work pool of cards is vanishingly small. (while a slightly larger number of cards are unmaintained and "might" work or might crash or might start crashing with the next update) The AMD driver support is so unmaintained that GCC compiler devs are dropping card support because the drivers are too broken to test if the compilers work.

    When nvidia CUDA will absolutely, trust-your-life-with-it work on nearly every card they make (consumer AND enterprise) back to a well-defined starting point, it's a no brainer why nobody bothers with AMD.
    Well the problem for AMD is that Nvidia is ahead of them. They are trying to get to that level of support and rOCM does work on a lot of devices now but in a "your life depends on it" scenario then you should still probably build that on top of the market leader unless using AMD gives you some sort of advantage (e.g cost savings or a support contract, etc).

    Leave a comment:


  • Developer12
    replied
    AMD *could* have good AI products. They routinely have decent hardware at a decent price, but their software and driver stack is absolute crap.

    It's a total gamble if any given AMD card will run ROCm. So far AMD seem to be intentionally ambiguous to hide the fact that the actual, guaranteed-to-work pool of cards is vanishingly small. (while a slightly larger number of cards are unmaintained and "might" work or might crash or might start crashing with the next update) The AMD driver support is so unmaintained that GCC compiler devs are dropping card support because the drivers are too broken to test if the compilers work.

    When nvidia CUDA will absolutely, trust-your-life-with-it work on nearly every card they make (consumer AND enterprise) back to a well-defined starting point, it's a no brainer why nobody bothers with AMD.

    Leave a comment:


  • ahrs
    replied
    Originally posted by scottishduck View Post
    It’s almost like the Ryzen AI thing was pure marketing buzz and no one actually goes to AMD for AI.
    I don't blame them, Microsoft is doing the exact same thing with "A new generation of Ai Laptops!", AMD had to have something for the uninformed consumer to compare against.

    Leave a comment:


  • royce
    replied
    Does this cover earlier AMD APUs like the Ryzen 9 7940HS?

    Leave a comment:


  • Mitch
    replied
    I'll be one of the dissenters on AI hardware. While I think it's overblown and a bit early to celebrate, I'm excited for a future where the AI acceleration can take over and stop using my CPU and GPU.

    If you download GPT4ALL (it has an official Flathub for those interested) you'll quickly be able to see what it can do with CPUs, and Vulkan, if you set up the latter. No expertise or finicking. Pure GUI, click around and go. Models I've seen vary from hundreds of megabytes to tens of gigabytes.

    I've had even the smaller, lighter models write me start-up scripts for my laptop to save on power and battery. It wasn't always perfect, but it's incredible for something that is totally offline, private, and requires no Internet. If I could offload it to an AI accelerator, it would not only use less power, but I'd be able to process far more intelligent and advanced models in less time.
    Last edited by Mitch; 13 October 2024, 10:29 AM.

    Leave a comment:

Working...
X