FYI, the most recent Manjaro update broke ollama (0.4.1): it can't find its rocm libs. Maybe fiddling with LD_LIBRARY_PATH would fix it but I found that their official install works just fine. I dumped it into some tmp dir (instead of the suggested /usr), and I run it off a symlink. Works well.
The new qwen2.5-coder model is miraculous (for Python work at least), even the 14B variant. That runs with 33 tok/sec on my GPU (quite usable). The 32B one won't fit so it runs off CPU (some GPU load too) at 5.5 tok/sec (not really fun to use).
Replace AMD with NVIDIA for LLMs, Manjaro Linux
Collapse
X
-
TBH, I see zero issues with rocm atm. It installs fine from the repos, you install PyTorch and tell it to use it. And that’s it. It just goes as does the NVIDIA stuff I’ve used on AWS.
Leave a comment:
-
-
Originally posted by lichtenstein View PostSweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.
Not sure how it is on Arch, but using AMDGPU-PRO's OpenCL libs also worked as an option back then that might be easier than dealing with ROCm.
Leave a comment:
-
-
Well, with a stable ROCm, my prev gen amd gpu works rather well and plenty fast. It compares fine against prev gen nvidia (and while say, a 3080 is slightly faster, the 6900xt has more vram (16G vs 12G) allowing me to run (slightly) bigger models e.g. mistral-nemo). Sure, a current top nvidia card would be twice as fast but come on, I got the GPU "for free" now since it's already in there . When/if I get more serious I can still get a more eggs-pensive nvidia card. Any GPU acceleration is orders of magnitude faster than CPUs and it gets me to do what I need. I won't be doing heavy training. Inference, Milvus (vector db), (local) LLMs, embeddings, that's what I need and for that, it's plenty fast.
Regarding stability, as long as pytorch works well with rocm, "everything" works. Tensorflow also has a rocm path. I'm quite happy with the result.
Again, many thanks to Lycanthropist for setting me on the right pathLast edited by lichtenstein; 06 November 2024, 01:05 PM.
Leave a comment:
-
-
Originally posted by Panix View PostAt least, you got what you wanted to work but AMD gpus suck at that kind of work.
Leave a comment:
-
-
Originally posted by lichtenstein View PostYou, sir, are my hero! (and you saved me a lot of money)
ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!
(removed the text here because I'm an idiot)
UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.
UPDATE2: The following did the trick for Pytorch/HuggingFace:
Code:pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.
For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.
I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .
Leave a comment:
-
-
You, sir, are my hero! (and you saved me a lot of money)
ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!
(removed the text here because I'm an idiot)
UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.
UPDATE2: The following did the trick for Pytorch/HuggingFace:
Code:pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.
For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.
I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .Last edited by lichtenstein; 03 November 2024, 06:43 PM.
Leave a comment:
-
-
You don't need to compile anything from AUR. ollama-rocm and all its dependencies are in the official repository.
Leave a comment:
-
-
Sweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.
Leave a comment:
-
-
LLM inference works fine on AMD using the ollama-rocm package. No need for ZLUDA.Last edited by Lycanthropist; 03 November 2024, 11:33 AM.
Leave a comment:
-
Leave a comment: