Michael only needs a headline to grab the ad revenuehttp://llvm.org/devmtg/2012-04-12/Sl...Karrenberg.pdf
see: Evaluation II: WFVOpenCL vs. Intel/AMD (milliseconds)
the meat of it being... they say that their WFVOpenCL tested code algorithm's beats the tested Intel OpenCL SDK v1.1 / AMD APP SDK v2.5 by an Average of : 2.5x (Intel), 40x (AMD)
lower being better OC.
and then there's the Conclusion
"OpenCL benefits from both multi-threading and WFV on CPUs"
WFV being "whole function vectorisation" or if you prefer whole function SIMD Optimization ,OC they could have known that if they had just asked the x264 devs and looked at it's assembly and C code functions to know that Intel SIMD beats AMD clock for clock for a very long time now
in fact the WFVOpenCL guys would probably get a lot more speed in their CPU Optimizations if they just took the x264 code examples and modified them and the general framework to their needs and pay special attention to the supplied
checkasm as The tool can be used to perform function-level benchmarks