I've shot this here on reddit:

https://www.reddit.com/r/OpenCL/comm...ig_allocation/

Pretty much found out that allocating a huge buffer and playing in parts of it is a lot better than allocation only what you need for some reason...