If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
- Simulating command buffers by submitting closures containing OpenGL commands to a queue so that the main thread can execute those OpenGL commands without any significant overhead.
You're essentially reimplementing Nvidia's "__GL_THREADED_OPTIMIZATIONS", but good luck with that, at least it will work on all vendors for you.
Originally posted by swoorup
I wonder what did Command Buffer address in response to OpenGL. Correct me if I am wrong. With OpenGL, you would set states one by one like the draw mode, depth buffer. However with command buffer, I assume these commands can directly be compiled all at once in a single frame into GPU assembly language and sent only once to avoid having being sent every time?
You'll still have to send them every frame I think (unless you specify that you want to reuse them, and the driver decides to optimize that), but the important part is that 1) yes you compile them only once, and 2) you can compile them on any CPU thread since the driver doesn't have to communicate with the GPU during compilation, so it's just like any other C function without side-effects. Actually, almost everything in Vulkan is designed to be without side effects, unless absolutely necessary (like when allocating VRAM, or submitting commands). That's why so many things can be run on arbitrary threads.
Yes and no. Thing is, it might still happen that the OpenGL commands themselves are the bottleneck - in that case, there might not be any advantage over Nvidia's implementation if it works the way I imagine it - but walking over the scene, determining which buffers to update and submitting GL commands all in a single thread might become a bottleneck even before that, so parallellizing that part certainly won't hurt.
Edit: Back on my Desktop machine with NVidia hardware - NVidia's threaded optimizations are disabled by default, and for a good reason. It causes 100% load on the main thread for whatever reason, even when the OpenGL thread is doing close to nothing.
That said, I already have trouble with Mesa as partial array texture updates from a buffer are ridiculously slow for some reason. The project may still end up as a complete failure.
Originally posted by Ancurio
You'll still have to send them every frame I think (unless you specify that you want to reuse them, and the driver decides to optimize that), but the important part is that 1) yes you compile them only once,
Does that mean that they can be re-used in a way similar to the old-style Display Lists in OpenGL 1.x? If so, most of the rendering process would be reduced to data buffer updates and submitting already existing command buffers.
I'm also curious about how the Descriptor Sets will actually work. As far as I remember the Mantle programming guide, fixed "binding points" for textures etc. will still be used, unlike in GL_ARB_bindless_texture, but the way of binding them will (obviously) be different.
Last edited by VikingGe; 27 November 2015, 07:31 AM.
Does that mean that they can be re-used in a way similar to the old-style Display Lists in OpenGL 1.x? If so, most of the rendering process would be reduced to data buffer updates and submitting already existing command buffers.
Well, if you add the "re-use" flag when creating the CB, you can submit it as many times as you want. I still think you'll be rebuilding them quite a lot as your scene changes, unless you have a completely static scene.
I'm also curious about how the Descriptor Sets will actually work. As far as I remember the Mantle programming guide, fixed "binding points" for textures etc. will still be used, unlike in GL_ARB_bindless_texture, but the way of binding them will (obviously) be different.
Khronos decided to not include bindless textures in Vulkan at this point. The desktop cards can all do it AFAIK, but some mobile chips can't. As for descriptor sets, your shader will expect a descriptor set of some format, like "1 texture, 2 buffers, and 4 constants". Then you can build a descriptor set matching that format, and put in any values or texture/buffer descriptors you want, then you bind that descriptor to the appropriate binding point (graphics or compute, depending on which pipeline you're dispatching) and make a draw/compute call (this is all inside the command buffer building of course). Filling descriptor sets with data might not be the fastest operation, but switching them between draws should be very fast I think, because the shader will always be reading from the same offsets, due to the fixed format.
Comment