Announcement

**s_j_newbury** · 09 February 2024, 09:27 AM

Originally posted by Panix View Post

No support of cards and only a few operating systems are supported too, which you omitted. I dunno about Gentoo and also I don't care - it's not one of the supported operating systems. I included a number of links in which ppl had trouble configuring Rocm let alone using it with their graphics cards.

Face it, AMD is a failure in Linux in EVERYTHING except gaming. Even then, there's a number of limitations due to their open source stature - and while it's a good principle to live by - there are some limitations and problems.

But, outside gaming, AMD (gpus) are just unusable and when you can use it - progress is slow, features are missing or not working - in other words, there's crashes or problems - and that's why ppl pick Nvidia or switch to Nvidia - even with their bad rep and closed software.

I don't think AMD GPUs, the drivers, or the support libraries are inherently unstable, but they are attempting to deploy upon an unstable base, and don't get the test coverage for every GPU which they've written code for. The ROCm/HIP support in various external projects also doesn't get the same amount of testing as for NVIDIA/CUDA.

Well my point with Gentoo isn't that AMD supports it, but it's easy to build a version of ROCm to support *your* GPU, otherwise a generic version of ROCm is huge, and potentially includes untested broken code for certain GPUs. A version that only supports the "officially supported" GPUs is still pretty big. That's probably one of the issues for distributions. AMD not supporting many Operating Systems isn't something I really care about, and is in my opinion something of a mistake in the other direction from AMD; they should make it easier for downstream distributors to package and support the platform, it's mostly just a matter of policy. If instead of supporting the binary releases for specific distributions, they could put the resources into helping downstream distributions with bug triaging and packaging issues. Hopefully, they'll move this way, I think there's plenty of untapped potential. For what it's worth, my interactions with their GitHub issue tracking has been fairly positive.

It's worth bearing in mind a given CUDA release will only support certain GPUs, if you want to use an older model you'll have to use a legacy version. It's just that they do keep legacy versions working for quite a while so it's an accepted approach from users and third party developers. Old versions of ROCm can be used but that just means old bugs, or incompatibilities with third party projects. AMD potentially is better positioned to provide an integrated compute framework on Linux, because as I suggested it can be built and packaged along with the rest of the system, while CUDA is obviously proprietary so can only be deployed on top.

**Panix** · 09 February 2024, 11:41 PM

Originally posted by s_j_newbury View Post

I don't think AMD GPUs, the drivers, or the support libraries are inherently unstable, but they are attempting to deploy upon an unstable base, and don't get the test coverage for every GPU which they've written code for. The ROCm/HIP support in various external projects also doesn't get the same amount of testing as for NVIDIA/CUDA.

Well my point with Gentoo isn't that AMD supports it, but it's easy to build a version of ROCm to support *your* GPU, otherwise a generic version of ROCm is huge, and potentially includes untested broken code for certain GPUs. A version that only supports the "officially supported" GPUs is still pretty big. That's probably one of the issues for distributions. AMD not supporting many Operating Systems isn't something I really care about, and is in my opinion something of a mistake in the other direction from AMD; they should make it easier for downstream distributors to package and support the platform, it's mostly just a matter of policy. If instead of supporting the binary releases for specific distributions, they could put the resources into helping downstream distributions with bug triaging and packaging issues. Hopefully, they'll move this way, I think there's plenty of untapped potential. For what it's worth, my interactions with their GitHub issue tracking has been fairly positive.

It's worth bearing in mind a given CUDA release will only support certain GPUs, if you want to use an older model you'll have to use a legacy version. It's just that they do keep legacy versions working for quite a while so it's an accepted approach from users and third party developers. Old versions of ROCm can be used but that just means old bugs, or incompatibilities with third party projects. AMD potentially is better positioned to provide an integrated compute framework on Linux, because as I suggested it can be built and packaged along with the rest of the system, while CUDA is obviously proprietary so can only be deployed on top.

Yes, but that's the reality.

Stable Diffusion Benchmarks: 45 Nvidia, AMD, and Intel GPUs Compared

https://www.tomshardware.com/pc-components/gpus/stable-diffusion-benchmarks

Which graphics card offers the fastest AI performance?

Setting Up TensorFlow with ROCm on the 7900 XTX

https://cprimozic.net/notes/posts/setting-up-tensorflow-with-rocm-on-7900-xtx/

I recently upgraded to a 7900 XTX GPU. The upgrade itself went quite smoothly from both a hardware and software perspective. Games worked great out of the box with no driver or other configuration needed - as plug and play as it could possibly get. However, I wanted to try out some machine learning on it. I’d been using TensorFlow.JS to train models using my GPU all in the browser, but that approach is limited compared to what’s possible when running it natively.

The amd gpu looks promising at times - but, other times - as you can see in the benchmarks - it's way behind Nvidia - which has more support via CUDA - and for a longer history of support.

Read the comments:

ROCM for GPU 7900 XTX · ROCm/ROCm · Discussion #1891

https://github.com/ROCm/ROCm/discussions/1891

Hi, Is there any official announcement available, if or when the 7900 XTX will be supported? Or a timeline...? Any information is appreciated! Artur

https://www.reddit.com/r/Amd/comments/15n3oto/rocm_llm_inference_gives_7900xtx_80_speed_of_a/

https://www.reddit.com/r/learnmachinelearning/comments/z6vj3b/rtx_4080_or_an_rx_7900_xtx_for_training_an_ai/

**Panix** · 09 February 2024, 11:52 PM

I want AMD to 'succeed' - make some headway and at least challenge Nvidia - simply because there needs to be some competition - not because I am pro-AMD or support AMD - they're both corp.'s - but, because there needs to be some other option other than CUDA.

https://www.reddit.com/r/Amd/comments/1ac9b9u/amds_instinct_mi300x_ai_gpu_is_causing_headaches/

**jonwil** · 14 February 2024, 04:51 AM

Is there any truth to the reports that AMD is intentionally acting to prevent ROCm being usable on consumer GPUs (especially the current latest AMD consumer GPUs) so people who need it have to buy expensive workstation cards instead?

**bridgman** · 15 February 2024, 01:39 AM

Originally posted by jonwil View Post

Is there any truth to the reports that AMD is intentionally acting to prevent ROCm being usable on consumer GPUs (especially the current latest AMD consumer GPUs) so people who need it have to buy expensive workstation cards instead?

Nope - in general if the code works on a workstation SKU it will also work on the corresponding consumer SKUs.

We did definitely prioritize datacenter compute GPUs over (workstation+consumer) GPUs but that's not the same thing.

**cgmb** · 15 February 2024, 02:45 AM

Originally posted by darkbasic View Post

Why don't you contribute to the official rocm ebuilds instead? I've done so myself to try getting rocm working on ppc64le (unsuccessfully) and they merge PRs fairly quickly.

Do the Debian packages work? They're all built for ppc64le.

**darkbasic** · 15 February 2024, 06:33 AM

Originally posted by cgmb View Post

Do the Debian packages work? They're all built for ppc64le.

I managed to fix the build on Gentoo as well, but it crashes at runtime. Anyway AMD developers just contacted me to tell that they should have fixed the issue, so I will test it soon.

**mzimmerm** · 19 March 2024, 03:28 AM

@s_j_newbury regarding https://www.phoronix.com/forums/foru...73#post1441773 , can you please share how you got the gfx902 working? Any details would be really appreciated

PS: I have spent about 2 weeks trying to get PyTorch on ROCm, trying everything (misc Linux distros, recommended combinations of ROCm and PyTorch versions etc) on my Ryzen 5 2500U, with no luck. I always end with this error (or worse) running this simple test
``` sh
HSA_OVERRIDE_GFX_VERSION=9.0.0 python -c "import torch; cuda0 = torch.device('cuda:0');print(torch.ones([2, 4], dtype=torch.float64, device=cuda0)); print('done')"
```
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: HIP error: shared object initialization failed
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

With this not working, and AMD official supported cards list is miniscule, hard to not get ideas about going to competition after 25+ years of AMD/ATI only

Announcement

AMD Releases ROCm 6.0.2 With Improved Stability For Instinct MI300 Series

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment