The bulk of Jason Henline's message to LLVM developers from yesterday describes their intentions:
CPUs, GPUs and other platforms. One place where we're investing a lot are parallel libraries, especially those closely tied to compiler technology like runtime and math libraries. We would like to develop these in the open, and the natural place seems to be as a subproject in LLVM if others in the community are interested.
Initially, we'd like to open source our StreamExecutor runtime library, which is used for simplifying the management of data-parallel workflows on accelerator devices and can also be extended to support other hardware platforms. We'd like to teach Clang to use StreamExecutor when targeting CUDA and work on other integrations, but that makes much more sense if it is part of the LLVM project.
However, we think the LLVM subproject should be organized as a set of several libraries with StreamExecutor as just the first instance. As just one example of how creating a unified parallelism subproject could help with code sharing, the StreamExecutor library contains some nice wrappers around the CUDA driver API and OpenCL API that create a unified API for managing all kinds of GPU devices. This unified GPU wrapper would be broadly applicable for libraries that need to communicate with GPU devices.
Of course, there is already an LLVM subproject for a parallel runtime library: OpenMP! So there is a question of how it would fit into this picture. Eventually, it might make sense to pull in the OpenMP project as a library in this proposed new subproject. In particular, there is a good chance that OpenMP and StreamExecutor could share code for offloading to GPUs and managing workloads on those devices. This is discussed at the end of the StreamExecutor documentation below. However, if it turns out that the needs of OpenMP are too specialized to fit well in a generic parallelism project, then it may make sense to leave OpenMP as a separate LLVM subproject so it can focus on serving the particular needs of OpenMP.
Google's StreamExecutor is then described as "unified wrapper around the CUDA and OpenCL host-side programming models (runtimes). It lets host code target either CUDA or OpenCL devices with identically-functioning data-parallel kernels. StreamExecutor manages the execution of concurrent work targeting the accelerator similarly to how an Executor_ from the Google APIs client library manages the execution of concurrent work on the host."
The StreamExecutor feature list notes:
* abstracts the underlying accelerator platform (avoids locking you into a single vendor, and lets you write code without thinking about which platform you'll be running on).
* provides an open-source alternative to the CUDA runtime library.
* gives users a stream management model whose terminology matches that of the CUDA programming model.
* makes use of modern C++ to create a safe, efficient, easy-to-use programming interface.
StreamExecutor makes it easy to:
* move data between host and accelerator (and also between peer accelerators).
* execute data-parallel kernels written in the OpenCL or CUDA kernel languages.
* inspect the capabilities of a GPU-like device at runtime.
* manage multiple devices.
StreamExecutor should be quite interesting and hopefully Google will end up opening the code quite soon. It will also be interesting to see what comes of the potential new LLVM sub-project. More details on all of it via this mailing list message.