Announcement

Collapse
No announcement yet.

NVIDIA GeForce GTX 1060 To RTX 4060 GPU Compute & Renderer Performance On Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Paradigm Shifter View Post
    I won't be gaming, and from previous experience bus width isn't the performance destroyer for my use case that it is for other things (e.g.: gaming) so the 16GB 4060Ti looks rather appealing if the price isn't insane. The 4060 is "cheap" here (not cheap, just less insane than the more powerful options) but I've learned long ago that prices in the US bear little to no resemblance to prices in Japan, so I'll wait and see. A "cheap" 16GB card would be nice (that extra 4GB over the 4070Ti helps a lot) but if it's too expensive then the 4080 becomes a more attractive option. And if I go there, I might as well go 4090 and have done with it.

    I hope Intel do something with more VRAM. The end of the 16GB A770 is rather daft. A 24 or 32GB card would be nice.
    The 4060 16GB price will be insane, lol.

    A used 3090 is the next step up, and its almost reasonable these days.

    ​​​​​​The rumored 256 bit, M2 Pro-esque Arrow Lake silicon from Intel (and an AMD equivalent) is very promising to me, but too far away... And probably not even coming to desktops anyway.




    Comment


    • #12
      Originally posted by brucethemoose View Post

      The crazy thing is that eager mode PyTorch isn't really meant for production deployment... Its for training, research, and experimentation, while various other frameworks can import pytorch models for performant inference. But alas thats where we are today. Researchers make all the projects people use and race to the next paper, and no one is around to port and optimize their projects.


      Anyway, for ML, you should skip the 4060. VRAM/bus width is everything, so save up for a 4060 TI 16GB or grab a 3060 instead.

      And keep an eye out for 24GB+ cards/big APUs from Intel.
      Arguably if you're doing LLM model creation you probably shouldn't be using anything consumer grade anyway. Long lived processes are going to run into data corruption issues relatively quickly from a lack of ECC RAM on the cards, system RAM, and data integrity checks on other parts of the pipelines. While that's not catastrophic on games, you just get weird geometry and texture glitches. Just kill the game, and resume where you left off. If it persists, check the game's file integrity. But on LLMs, you have no way of knowing if your resulting model has been subtly corrupted from bit flips because the process output is probably going to be non-deterministic. That is, you can't repeat the results for confirmation.

      Client side can get away with the quirks of consumer grade hardware, because (hopefully*) the models are built elsewhere on systems with data integrity checks in their hardware. Client is only concerned with a reasonably accurate result from utilizing the already computed model.

      *I wouldn't necessarily bet that's always true. This is the next bubble-style gold rush, and I'm sure there's more than a few players trying to build models on the cheap with inadequate hardware while proclaiming loudly how great their (snake-oil) AI-powered software services are.

      Edit to add: OP is right on AMD's GP compute SDK though. It sucks. The reason CUDA is so popular is because it works. It's extremely easy to use and deploy. You can either spend hours working on getting AMD's SDK to work, or spend those hours writing your program for CUDA or running your program if you're a user instead of developer.
      Last edited by stormcrow; 10 July 2023, 11:55 PM.

      Comment


      • #13
        Originally posted by panikal View Post
        AMD, are you listening? ...
        I am a happy owner of a 3060Ti, but the benchmarks I have seen so far including these here just show how important bandwidth next to core count still is. The 3060Ti here is 42% faster than the 3060 and the main difference between the two is the Ti having a 192-bit bus with 4864 cores versus a 128-bit bus with 3584 cores. This is all it takes to get 42% more speed - a 50% wider bus and about 36% more cores. As long as AMD can match this will they not have a problem to keep up.
        Last edited by sdack; 11 July 2023, 04:58 AM.

        Comment


        • #14
          Originally posted by panikal View Post

          AMD, are you listening? I'm a casual gamer whose first graphics card was a CGA thing. I've had S3, 3dfx, nvidia, radeon, I didn't care, I went with Bang For The Buck and how much Fun I could get out of it.

          For the first time as a *user* I have a real very interesting reason to use my graphics accelerator for interesting things other than <games> (blockchain doesn't count). CUDA is king and ROCm is still not really for consumer side or really supported at all by most ML projects out-of-the-box. I mean I have to set crazy env variables to even attempt to get ROCm support on torch or whatever and I'm not a python/ML dev, just a user on this one.

          Please please give your power users more ROCm love or hack out a way to get CUDA working with radeons (don't say it can't be done, someone did it for Intel.) I don't even care if ML apps aren't quite as fast as an nvidia, I just want it *to work* without hours or days of hacking and possible rocm recompiling 5 times to find a version that works with software and works with my adapter.......I'm really really jealous of the nvidia users ability to Just Have ML Work (tm).

          I'm seriously considering a 4060 (hopefully ti or super plus good edition if out by then) for next purchase.....don't make me do it, please?
          There are vulkan compute and OpenCL to run on a radeon system, maybe you can consider them instead of suffering from ROCm.

          Comment


          • #15
            Originally posted by antonyshen View Post

            There are vulkan compute and OpenCL to run on a radeon system, maybe you can consider them instead of suffering from ROCm.
            As a curious user of ML projects who is a dev but not a C/python dev, what you suggest is Reasonable but far outside my ability to commit with time. I have a day job and family and am just curious and wanting to run the Latest Cool ML model from GitHub. Sometimes opencl is supported but that is incredibly rare.

            CUDA is the gorilla in the room, ROCm maybe be superior and opencl more free, but 90% (subjective feeling in my experience) of research code and new projects are CUDA only, excluding any AMD manufactured adapters. If ROCm is actually supported, it's only through a container because ROCm userland is hard to maintain.

            Comment


            • #16
              Originally posted by carewolf View Post

              So is your image quality and your delays. Fuck fake frames.
              For me is good.

              Witout RT lighing and shadow is fake too - and so what? For me are good too.

              Comment


              • #17
                Originally posted by stormcrow View Post

                Arguably if you're doing LLM model creation you probably shouldn't be using anything consumer grade anyway. Long lived processes are going to run into data corruption issues relatively quickly from a lack of ECC RAM on the cards, system RAM, and data integrity checks on other parts of the pipelines. While that's not catastrophic on games, you just get weird geometry and texture glitches. Just kill the game, and resume where you left off. If it persists, check the game's file integrity. But on LLMs, you have no way of knowing if your resulting model has been subtly corrupted from bit flips because the process output is probably going to be non-deterministic. That is, you can't repeat the results for confirmation.
                A couple of things:

                - No one is training LLMs from scratch on consumer hardware, just finetuning them... And just for fun or experimentation. You would have to be crazy to run a local LLM (or other generative AI) in any kind of role that can't tolerate errors or mistrained models.

                - Enterprise GPUs/TPUs are the only option for larger scale training anyway because of vram limitations and performance... But not because of ECC. The datasets are full of errors, training fails for all sorts of reasons and needs to be restarted. Bit flips are the least of anyone's issues, I think.

                Comment


                • #18
                  Originally posted by panikal View Post
                  I'm seriously considering a 4060 (hopefully ti or super plus good edition if out by then) for next purchase.....don't make me do it, please?
                  I know this was 8 month ago, but context still actual.

                  To play with ML:
                  with some basic project or some simple memory-sorting - literally any Nvidia GPU is enough, any Nvidia with 4+Gb VRAM.
                  For context Nvidia 1080 will work better in GPU-compute than any of AMD RDNA2 gpus.
                  So getting literally any cheap Nvidia GPU will be enough.

                  And when you need 16Gb VRAM - just use Google Colab that provide free 16Gb Nvidia GPU for ML.

                  To play with StableDiffusion, audio-noise filtering, speech to text/ai-tts, other popular AI-software - get 4060, 8Gb will be enough for "most cases".
                  For serious professional AI-tools-software usage - you need 16Gb GPU, that is $1000.

                  To train ML-AI-models - you need 4090RTX, and even on 4090RTX training will take... weeks-months.

                  Comment

                  Working...
                  X