I would love to see the performance difference between running an OpenCL implementation of an algorithm compared to a CPU implementation. Too bad OpenCL...