NVIDIA has announced the release of the CUDA 4.0 Tool-Kit this morning, which continues to be fully supported under Linux. NVIDIA's Compute Unified Device Architecture 4.0 focuses upon GPUDirect 2.0 Technology, Unified Virtual Addressing, and Thrust C++ Template Performance Primitive Libraries.
GPUDirect 2.0 is geared to provide peer-to-peer communication between multiple GPUs in a single server/workstation, Unified Virtual Addressing provides a single memory address space for system memory and GPU memory, and the Thrust C++ Template Performance Libraries ramp-up the GPGPU computing performance via an open-source C++ library with parallel sorting abilities that are 5~100x faster than the Standard Template Library or Intel's Threaded Building Blocks.
Other CUDA 4.0 changes include MPI integration with CUDA applications, multi-thread sharing of GPUs, multi-GPU sharing by a single CPU thread, new libraries, a GPU binary disassembler, auto performance analysis in the Visual Profiler, and other changes.
Find more from the NVIDIA Press Room