Replace AMD with NVIDIA for LLMs, Manjaro Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • lichtenstein
    replied
    FYI, the most recent Manjaro update broke ollama (0.4.1): it can't find its rocm libs. Maybe fiddling with LD_LIBRARY_PATH would fix it but I found that their official install works just fine. I dumped it into some tmp dir (instead of the suggested /usr), and I run it off a symlink. Works well.

    The new qwen2.5-coder model is miraculous (for Python work at least), even the 14B variant. That runs with 33 tok/sec on my GPU (quite usable). The 32B one won't fit so it runs off CPU (some GPU load too) at 5.5 tok/sec (not really fun to use).

    Leave a comment:


  • lichtenstein
    replied
    TBH, I see zero issues with rocm atm. It installs fine from the repos, you install PyTorch and tell it to use it. And that’s it. It just goes as does the NVIDIA stuff I’ve used on AWS.

    Leave a comment:


  • Espionage724
    replied
    Originally posted by lichtenstein View Post
    Sweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.
    I can confirm this a couple years ago with a RX 580, RX 6600 XT and RTX 3060; OpenCL both AMD GPUs with AMDGPU and ROCm on Fedora was a total PITA (was back with the Mesa shim/Copr). NVIDIA CUDA was easy with the general driver install from RPM Fusion.

    Not sure how it is on Arch, but using AMDGPU-PRO's OpenCL libs also worked as an option back then that might be easier than dealing with ROCm.

    Leave a comment:


  • lichtenstein
    replied
    Well, with a stable ROCm, my prev gen amd gpu works rather well and plenty fast. It compares fine against prev gen nvidia (and while say, a 3080 is slightly faster, the 6900xt has more vram (16G vs 12G) allowing me to run (slightly) bigger models e.g. mistral-nemo). Sure, a current top nvidia card would be twice as fast but come on, I got the GPU "for free" now since it's already in there . When/if I get more serious I can still get a more eggs-pensive nvidia card. Any GPU acceleration is orders of magnitude faster than CPUs and it gets me to do what I need. I won't be doing heavy training. Inference, Milvus (vector db), (local) LLMs, embeddings, that's what I need and for that, it's plenty fast.

    Regarding stability, as long as pytorch works well with rocm, "everything" works. Tensorflow also has a rocm path. I'm quite happy with the result.

    Again, many thanks to Lycanthropist for setting me on the right path
    Last edited by lichtenstein; 06 November 2024, 01:05 PM.

    Leave a comment:


  • Lycanthropist
    replied
    Originally posted by Panix View Post
    At least, you got what you wanted to work but AMD gpus suck at that kind of work.
    They may be a little slower, but they certainly don't "suck". They even got a pretty good VRAM/price ratio compared to nVidia.

    Leave a comment:


  • Panix
    replied
    Originally posted by lichtenstein View Post
    You, sir, are my hero! (and you saved me a lot of money)

    ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!

    (removed the text here because I'm an idiot)

    UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.

    UPDATE2: The following did the trick for Pytorch/HuggingFace:
    Code:
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
    UPDATE3: Benchmarks!
    On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.

    For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.

    I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .
    At least, you got what you wanted to work but AMD gpus suck at that kind of work.

    Leave a comment:


  • lichtenstein
    replied
    You, sir, are my hero! (and you saved me a lot of money)

    ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!

    (removed the text here because I'm an idiot)

    UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.

    UPDATE2: The following did the trick for Pytorch/HuggingFace:
    Code:
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
    UPDATE3: Benchmarks!
    On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.

    For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.

    I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .
    Last edited by lichtenstein; 03 November 2024, 06:43 PM.

    Leave a comment:


  • Lycanthropist
    replied
    You don't need to compile anything from AUR. ollama-rocm and all its dependencies are in the official repository.

    Leave a comment:


  • lichtenstein
    replied
    Sweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.

    Leave a comment:


  • Lycanthropist
    replied
    LLM inference works fine on AMD using the ollama-rocm package. No need for ZLUDA.
    Last edited by Lycanthropist; 03 November 2024, 11:33 AM.

    Leave a comment:

Working...
X