NVIDIA CUDA 12.0 Released With Official JIT LTO, C++20 Dialect Support
NVIDIA has released CUDA 12.0 as the latest major feature update to their proprietary compute API.
CUDA 12.0 brings many changes including new capabilities for their latest Hopper and Ada Lovelace GPUs, updating their C++ dialects, making JIT LTO support official, new and improved APIs, and an assortment of other features.
- CUDA 12.0 exposes programmable functionality for many features of NVIDIA's Hopper and Ada Lovelace architectures. Among the new CUDA 12.0 features fur use with Hopper and Ada are many tensor operations now supported with the public PTX intermediate representation, C intrinsics for cooperative grid array (CGA) relaxed barrier support, programmatic L2 cache to SM multi-cast, genomics/DPX instructions, and other additions.
- Support for using virtual memory management APIs with GPUs marked as CUDA_VISIBLE_DEVICES.
- Application and library developers can programmatically update the priority of CUDA streams.
- Revamped CUDA Dynamic Parallelism APIs with "substantial" performance improvements over the prior APIs.
- Just-In-Time Link-Time Optimizations (JIT LTO) is now officially supported through the nvJitLink library.
- GCC 12.1 host compiler support.
- NVCC and NVRTC support for the C++20 dialect.
- NVRTC updated its default C++ dialect from C++14 to C++17.
More details on all of the CUDA 12.0 changes via the release notes. Download CUDA 12.0 for all major platforms from developer.nvidia.com.
CUDA 12.0 brings many changes including new capabilities for their latest Hopper and Ada Lovelace GPUs, updating their C++ dialects, making JIT LTO support official, new and improved APIs, and an assortment of other features.
- CUDA 12.0 exposes programmable functionality for many features of NVIDIA's Hopper and Ada Lovelace architectures. Among the new CUDA 12.0 features fur use with Hopper and Ada are many tensor operations now supported with the public PTX intermediate representation, C intrinsics for cooperative grid array (CGA) relaxed barrier support, programmatic L2 cache to SM multi-cast, genomics/DPX instructions, and other additions.
- Support for using virtual memory management APIs with GPUs marked as CUDA_VISIBLE_DEVICES.
- Application and library developers can programmatically update the priority of CUDA streams.
- Revamped CUDA Dynamic Parallelism APIs with "substantial" performance improvements over the prior APIs.
- Just-In-Time Link-Time Optimizations (JIT LTO) is now officially supported through the nvJitLink library.
- GCC 12.1 host compiler support.
- NVCC and NVRTC support for the C++20 dialect.
- NVRTC updated its default C++ dialect from C++14 to C++17.
More details on all of the CUDA 12.0 changes via the release notes. Download CUDA 12.0 for all major platforms from developer.nvidia.com.
2 Comments