Announcement

Collapse
No announcement yet.

Replace AMD with NVIDIA for LLMs, Manjaro Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace AMD with NVIDIA for LLMs, Manjaro Linux

    Hey y'all,
    Atm, my Manjaro Linux rig has a Radeon 6900XT. It's a fine card, works well for desktop & gaming.

    What doesn't work well (or, at all) is GPU acceleration for deep learning (I want to run smaller LLMs locally, etc). For that I'd need something like Rocm actually working (it won't build from AUR, the damn thing) + ZLUDA. I'm tired of trying to make all of this work. So I'm thinking of switching to an RTX 4080S or similar.

    With the AMD open source drivers I've never had any issues on Linux desktop and such. What's the current state of the nvidia ones? Can I just put in the nvidia card (install their drivers) and stuff just works e.g. Wayland on KDE?

    What problems can I expect? I use two monitors.
    Last edited by lichtenstein; 02 November 2024, 01:25 PM. Reason: Added the two monitors bit.

  • #2
    As an alternative, I could keep the Radeon card and install a 2nd one e.g. a 16G Rtx 4060 to be used for the deep learning stuff e.g. HuggingFace inference. How are the chances of this working out better?

    Comment


    • #3
      LLM inference works fine on AMD using the ollama-rocm package. No need for ZLUDA.
      Last edited by Lycanthropist; 03 November 2024, 11:33 AM.

      Comment


      • #4
        Sweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.

        Comment


        • #5
          You don't need to compile anything from AUR. ollama-rocm and all its dependencies are in the official repository.

          Comment


          • #6
            You, sir, are my hero! (and you saved me a lot of money)

            ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!

            (removed the text here because I'm an idiot)

            UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.

            UPDATE2: The following did the trick for Pytorch/HuggingFace:
            Code:
            pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
            UPDATE3: Benchmarks!
            On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.

            For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.

            I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .
            Last edited by lichtenstein; 03 November 2024, 06:43 PM.

            Comment


            • #7
              Originally posted by lichtenstein View Post
              You, sir, are my hero! (and you saved me a lot of money)

              ollama installed fine and it recognizes the GPU! Llama 3.1 8B flies at 65 tokes per second, all nicely GPU accelerated while using about 8G of vram!

              (removed the text here because I'm an idiot)

              UPDATE: python-pytorch-opt-rocm is available in extra too. I dunno why I was trying to get it out of AUR. Probably because ages ago that was the only place it was available. It installed just fine as well.

              UPDATE2: The following did the trick for Pytorch/HuggingFace:
              Code:
              pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
              UPDATE3: Benchmarks!
              On ResNet50, batch size 32 I got a throughput of 1420 images/second. My CPU does 43 img/sec (Ryzen 5950x), 32x slower (so, a huge win for me!). Will be running mostly LLM code but this bench is what ChatGPT spit out first.

              For comparison, RTX 2080 Ti is at around 1100-1300, RTX 3090 is about 2000-2200, a 4090 at 3500–3800.

              I'll be testing some more in the next days but man, this looks good! I won't be needing that nvidia GPU after all .
              At least, you got what you wanted to work but AMD gpus suck at that kind of work.

              Comment


              • #8
                Originally posted by Panix View Post
                At least, you got what you wanted to work but AMD gpus suck at that kind of work.
                They may be a little slower, but they certainly don't "suck". They even got a pretty good VRAM/price ratio compared to nVidia.

                Comment


                • #9
                  Well, with a stable ROCm, my prev gen amd gpu works rather well and plenty fast. It compares fine against prev gen nvidia (and while say, a 3080 is slightly faster, the 6900xt has more vram (16G vs 12G) allowing me to run (slightly) bigger models e.g. mistral-nemo). Sure, a current top nvidia card would be twice as fast but come on, I got the GPU "for free" now since it's already in there . When/if I get more serious I can still get a more eggs-pensive nvidia card. Any GPU acceleration is orders of magnitude faster than CPUs and it gets me to do what I need. I won't be doing heavy training. Inference, Milvus (vector db), (local) LLMs, embeddings, that's what I need and for that, it's plenty fast.

                  Regarding stability, as long as pytorch works well with rocm, "everything" works. Tensorflow also has a rocm path. I'm quite happy with the result.

                  Again, many thanks to Lycanthropist for setting me on the right path
                  Last edited by lichtenstein; 06 November 2024, 01:05 PM.

                  Comment


                  • #10
                    Originally posted by lichtenstein View Post
                    Sweet but Rocm won't build (pulling from AUR) and I've been checking it once or twice a year over the last 2 years (private/just-for-fun projects). Afaik, NVIDIA's CUDA stuff just works. If I could make Rocm work just as reliably (there is a pytorch rocm package too) then I wouldn't need another card.
                    I can confirm this a couple years ago with a RX 580, RX 6600 XT and RTX 3060; OpenCL both AMD GPUs with AMDGPU and ROCm on Fedora was a total PITA (was back with the Mesa shim/Copr). NVIDIA CUDA was easy with the general driver install from RPM Fusion.

                    Not sure how it is on Arch, but using AMDGPU-PRO's OpenCL libs also worked as an option back then that might be easier than dealing with ROCm.

                    Comment

                    Working...
                    X