No announcement yet.

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by tenchrio View Post

    LMAO here you go again. No unlike you, I know my use case. I know it would now benefit from AMD over Nvidia (at least from a value perspective, again get a 4090 if you are so worried). I tend to gravitate towards NPR with lots of particles and simulations (take a mesh make it explode with the Cell Fracture add on and a force emitter, fun stuff) . When I do use cycles I screw around with render passes and low sample rates to get intentionally unrealistic results, there is some incredibly stunning results (check out paul o caggegi inkspot, that is what inspired me) you can get with it, but some already took 2 seconds to render on CUDA back on my GTX 1080 TI so I doubt it would benefit from Optix as much as just plain from the performance boosts GPUs already got naturally.

    Sometimes I make game assets or vr-avatars for friends and don't need to render at all (only the material viewport to check if my UVs are fine, material setup happens in Unity anyway). I own a 3D Printer (Artillery Sidewinder, nice and big) and my best friend is into tabletop, I never texture what he wants me to make so all that would matter is viewport performance. AMD would have been fine here, I even did it at a LAN party once and experienced 0 issues (I believe they had a 6900XT, I only brought my laptop and it was for some quick models for a game they were making).

    The difference is that I already posses a vast amount of knowledge and even experience on the subject, you are literally looking for negatives to convince yourself you shouldn't go AMD, find them and then pretend as if that means AMD is just not an option at all, problems exist with Optix too, it is possible you may not experience them (OSL isn't used by everyone for example) or perhaps maybe you will. Or maybe what you create is closer to the Whitelands Render in the Techgage benchmarks and optix will improve the speed by only about 5% over Cuda (something you keep ignoring as well, Scanlands sees the greatest increase, I might check tomorrow after work why that is but in the other 2 benchmarks Optix impact is considerably less especially for the high end cards and as stated before the output is a tad different, examples exist where this is even noticeable by the human eye). Obviously I am not going to name every issue with Optix, but it is clear you are looking for HIP issues on purpose, of course you would find many "lol".

    Did you ever open Blender, did you ever render?
    Do you even know if you would render with Cycles or Eevee? Did you ever even touch that setting to begin with (because it defaults to Eevee)?
    Hell if truly all you care about is speed than Eevee is king, what takes a minute in Cycles, takes a second in Eevee (but the output will look quite different). Eevee also doesn't have a separate AMD and Nvidia implementation it uses OpenGL for both and 1 day with Eevee-Next Vulkan (man kinda weird how this one is used by all 3 GPU makers but you don't blame AMD over the Blender foundation on it not being finished or optimized despite the fact it would bring RT acceleration to Eevee and has been late for 2 Blender versions now, or could it be that developing this stuff is hard and takes time and no amount of whining will speed it up).

    What are your VRAM requirments going to be? Are you going to use 4K textures or procedural materials? Will you be looking to texture paint? Will you bake normals, color, AO etc to increase render times (but up VRAM)? What is the output size? All these things matter, an RTX 4080 only has 16GB and for a god forbidden reason a 256 bit bus meaning it's theoretical bandwidth is 716.8 GB/s which is lower than the much cheaper RX 7900XT with its 20GB 320bit bus and bandwith of 800GB/s which in theory can negatively effect the render time under a shared memory workload (so Data has to swap from system RAM) even more for the RTX 4080 which also would have to rely on this faster and I linked to it before the difference in render time doubled for the 3070 over the 3060 12GB (which note is a weaker card by default meaning you lost more than double the render time from the 3070's perspective), so there goes even the optimisitc performance advantage of Scanlands (which renders on an RX6500XT with 4GB and still outspeeds an Arc A380 without RT, so chances are nothing here is baked and textures are low res and reused) and you are more likely to OOM not being able to render at all.

    People tend to go with Nvidia over AMD overall, and the RTX 30 line is a prime example how that wasn't necessarily the right choice for some users, most people bought it for video games and now there are constant examples why buying an RX6800 over an RTX 3070 would have been better for that use case, also relating to the VRAM.
    Some people use Windows for AI, despite the fact WDM2 still only allows the allocation of 81% of VRAM to CUDA applications as it has been doing since its introduction with Windows 10 (most popular desktop OS by the way) and was the reason I went full on Linux (yes Blender was affected). Popular doesn't mean better.

    If you want to combine AI with Blender with for instance AI Render/ stablediffusion plug in, I can guarantee you the second you load in lora's and textures your VRAM will go by quick. I once tested it with the combination of an SDXL checkpoint, with about 2 lora's in a relatively simple scene on a 4090 and managed to make the system unusable as the PC only had 32GB of normal RAM (when I tested a second time I started killing the Stablediffusion the instant I saw the swap just filling up like crazy).

    Again go for a 4090, best thing, just expensive. 3090 great if you can get a deal (do watch for out for cryptominers). Some complain about the installation of ROCM being complex, I laugh as I red the install instructions and wonder how people fumble copy paste. There are some cases (Eevee and viewport) where the Rx7900XTX outspeeds even the 4090. But like you're just gonna have to live with it, either choice you make, the consequences only matter if/when they apply. Maybe by the time HIP-RT matures you never used Optix and were experimenting with AI, Eevee or perhaps even Luxcore. Maybe the viewport performance never makes a noticeable drop.

    Maybe you never hit 16GB of VRAM and the RTX 4080 would have been just fine even faster than the 3090. Maybe everything you do can be done on a 4060 TI and the performance difference wouldn't have even mattered. There are people that for instance enjoy making low poly renders in ortographic perspective and I swear I have seen making nothing else, the integrated graphics on a laptop would suffice for them. The problem is you don't know your actual use case, spot 1 negative, make outlandish claims based on 0 experience and want every one of us to accept it as gospel. The 7900XTX has its set of upsides in Blender, it just depends on the user (to which you admit not being one yet somehow pretend as if you base your opinion on).
    I liked your post even though you have a bit of a jerk attitude - I think, somehow, in some twisted way, you're trying to help and you're right - AMD might have the RDNA 3 - at least, the 7900 series as an 'alternative' - I just think it should be better and my 'anti-AMD' posts are ALWAYS a reaction to how Nvidia is a monopoly (or near monopoly - I think it's semantics at this point in time) - and WE desperately need another option - preferably open source and Intel just isn't there yet - in any sphere although their OpenAPI is at least, *something* - considering they haven't exactly been in the business of dedicated gpus for very long. At least, they're open source?!? :-/

    So, I decided to 'like' your post - because you did have some interesting info and points. Cheers.


    • No problem.
      Nvidia closed the possibility


      • Originally posted by boboviz View Post
        No problem.
        Nvidia closed the possibility

        1.2 Limitations - taken from #8? The exactly wording isn't a copy but it's close enough?

        Do note?: "Last updated: October 8, 2021."

        Surely, Janik and AMD's lawyers looked at everything - and decided everything was 'okay' until they released the 'results' publicly? The timing seems that way? Then Janik is unofficially not an employee and it's been open source(d) to the community? I dunno - maybe, I'm mistaken but on the right track? Good luck to AMD and the community, anyway.


        • Does this ZLUDA even work on real complex projects? Like ML stuff?
          Last time I checked 2 weeks ago on ROCM - there still some fundamental problems that crash gpu/os on complex-real usage, not on launching hello-worlds.
          If ZLUDA can launch complex CUDA projects - it will run/trigger into same behavior/bugs as ROCM has... so does it even work?