Microsoft conceived C++ Accelerated Massive Parallelism (AMP) as a library atop DirectX 11 for offering data-parallelism directly in C++ that can make easy use of GPUs while having CPU fall-back support. With C++ AMP being similar to OpenCL, Intel engineers decided to implement the Microsoft specification within OpenCL and using LLVM/Clang so that it can be used cross-platform.
Microsoft considers C++ Accelerated Massive Parallelism to be one of their open specifications (it's under their "Community Promise" license), but with being implemented atop DirectX 11 and the compiler support being only built into Microsoft Visual Studio 2012, it isn't widely available outside of Microsoft's scope.
Engineers at Intel ended up developing "Shevlin Park", which is a prototype implementation of C++ AMP built using OpenCL with LLVM/Clang. The LLVM/Clang compiler stack was modified to handle C++ AMP programming constructions and the C++ AMP computations expressed within OpenCL compute kernels.
The C++ AMP run-time library was also implemented on an OpenCL run-time. Being implemented in this manner, C++ AMP can now be used within non-Microsoft/Windows environments. This Intel implementation with LLVM works on both the GPU and CPU.
The Shevlin Park project was talked about earlier this month at the LLVM Developers' Meeting in San Jose, California. The Intel slides covering Shevlin Park can be found here
As far as why someone would want to try C++ AMP rather than just using OpenCL or other GPGPU models, Intel's Dillon Sharlet describes the Microsoft interface as an "elegant, minimal C++ extensions and template libraries for data parallel programming." C++ AMP has the host and device code in the same programming language while concealing any driver APIs. Meanwhile, the data programming parallel model is very close to that of OpenCL and is similar to that of the NVIDIA CUDA run-time API.
Benchmarks by Intel show that their Shevlin Park implementation of C++ AMP can actually outperform that of the C++ AMP support within Visual Studio 2012. In some cases, using raw OpenCL is faster than both Accelerated Massive Parallelism versions.
The Shevlin Park work is still considered experimental but the slides are definitely worth checking out for anyone interested in low-level code/compiler work.