NVIDIA CUDA 11.6 Brings Convenient "-arch=native", Defaults To New "GSP" Driver Mode
CUDA 11.6 has numerous changes for advancing the NVIDIA compute stack including the convenient "-arch=native" compiler option (similar to "-march=native" with classic system code compilers), beginning to make use of the GPU System Processor driver code path by default on capable hardware, various performance optimizations, and other updates.
Some of the NVIDIA CUDA 11.6 highlights include:
- Turing and Ampere GPUs will now default to using the "GSP" driver architecture. GSP is short for the GPU System Processor and allows for offloading some GPU initialization/management tasks from the CPU to the GPU. The NVIDIA GSP relies upon binary-only firmware to be loaded and with Linux is found in their recent proprietary driver releases. Making use of the GPU System Processor should help for performance and freeing up of CPU resources. (The NVIDIA GSP with Turing/Ampere does appear to be NVIDIA's next-gen Falcon micro-controller built on RISC-V.) NVIDIA's Linux driver stack is still working on building up its usage around the GSP while for now it's in the best shape for their accelerator cards.
- CUDA 11.6 has full support for the 128-bit integer data type.
- The CUDA compiler adds "-arch=native" for easily targeting the installed GPU(s) during compilation. The "-arch=native" makes things much easier than altering the "gencode=arch" compute/code values for manually specifying the desired architecture target to use.
- NVIDIA's nvlink linker can now create PTX files.
- CentOS Linux 8 support has been deprecated given its upstream EOL and support will be removed entirely in a future CUDA release.
While proprietary, NVIDIA's Linux driver stack is at least maintained very well and aggressively adds support for new features and capabilities, including for CUDA, and well supported from consumer to enterprise Linux distributions.
CUDA 11.6 can be downloaded at developer.nvidia.com.