CUDA is quite nice all considering particularly for its age and ubiquity over their recent generations of product lines.
OTOH the articles here lately suggest that there's movement to get support integrated into LLVM for OpenAMP, SYCL, et. al.
nvidia's PTX ISA is roughly stable but evolves new generational features and is easy enough to target by an IR / code generator.
I believe the same is / could be so for intel DG2 ISA and amd RDNAx ISA.
And even nvidia's own application notes suggest HPC people with codes to accelerate DON'T (have to) start with CUDA but
rather start out looking at standardized C++ / fortran parallel constructs, the vendor neutral OpenMP specification's decorations,
OpenACC, and also their canned performance primitives, canned libraries for BLAS, FFT, whatever else.
One can do similarly with other GPUs, use standard parallel language constructs that can target CPU or GPU MT / SIMD first, then
look at say OpenMP, OpenACC then consider in what small "hot" areas for performance it's even relevant to optimize further to
go down to more significantly platform specific (CUDA / ROCm / SYCL / OneAPI / GPU ISA ) things.
So hopefully we'll be entering a better more vendor neutral time where we can more and more just specify some kind of vectorization /
parallelism / threading in a generic way in code and have LLVM or whatever target that to RISC/x64 CPUs, AMD/NV/INTEL GPUs,
different CL implementations whether based indirectly on vulkan / DX, or GPU ISAs.
I've lost track entirely of what stack elements can play with which others to know if POCL or RUSTICL or what not could actually work on
nouveau these days but I imagine it'll get there.
I'd have likely already bought an AMD GPU were it not for horror stories about generationally delayed / skipped ROCm support and so on,
so now I use NVIDIA + ARC and we'll see if / when another AMD joins the party after my 9800s from ~2008.
Originally posted by drastic
View Post
Leave a comment: