To elaborate, what you call AOT is actually JIT: the code is compiled by the driver just before execution. AOT would mean distributing a precompiled binary shader - this *is* possible on D3D but the binary is *still* JITed by the driver.
Announcement
Collapse
No announcement yet.
Linux 2.6.36-rc5 Kernel Released; Fixes 14 Year Old Bug
Collapse
X
-
Originally posted by cb88 View Post@V!NCENT you forget branch prediction.... as the code is running it can be optimised for the most common branch taken... where as that is pretty much impossible with AOT unless you generate a profile even then that can change.
Comment
-
@Qaridarium don't worry too much about your english it always fails anyway whether you realize it or not.
I don't think you understand RISC and VLIW well either... the point of RISC is not speed through minimalism but rather speed through tailoring your cpu design to the code you intend to execute. In any case VLIW will not succeed due to the difficulty of building a good compiler for it... RISC will always win there. Take that with a grain of salt though as RISC may well become more VLIW like as the compilers improve... remmber RISC isn't about minimalism its about the most optimised/balanced design.
Comment
-
Originally posted by BlackStar View PostTo elaborate, what you call AOT is actually JIT: the code is compiled by the driver just before execution. AOT would mean distributing a precompiled binary shader - this *is* possible on D3D but the binary is *still* JITed by the driver.
The real question was, after shader code is compiled the first time by the driver (and I do mean really compiled, not transformed in some sort of intermediate representation like bytecode) and uploaded into videocard RAM, does it really have to be changed later, probable based on some statistics, reported by the GPU (cache misses, branch mispredictions etc.)
And btw, do modern GPUs predict branches at all, or do they imply some simple logic optimized for loops like "branch always goes back"?
Comment
-
Originally posted by OlegOlegovich View PostYeah, looks like I've misused the terms. I always thought that JIT is called 'on-demand' in cases, where it is possible to recompile the code in the more efficient way (or compile it for the first time if it was run only on the interpreter before).
The real question was, after shader code is compiled the first time by the driver (and I do mean really compiled, not transformed in some sort of intermediate representation like bytecode) and uploaded into videocard RAM, does it really have to be changed later, probable based on some statistics, reported by the GPU (cache misses, branch mispredictions etc.)
And btw, do modern GPUs predict branches at all, or do they imply some simple logic optimized for loops like "branch always goes back"?
Comment
-
Originally posted by BlackStar View PostOlder Nvidia drivers would recompile whenever you changed uniforms to/from 0.0 (and some other specific numbers). While this "optimization" presumably led to faster execution, it also introduced untold headaches to developers (change a random parameter, get a 30ms stall waiting for the shader to recompile) so it was disabled at some point.
Yes, modern GPUs predict branches.
Comment
-
OK, glad this all worked out while I was sleeping
"AOT when it goes through the driver" (as opposed to "AOT before the app is distributed") is what I was calling JIT. Most apps prepare their shader programs during startup so that they are JIT-compiled once per invocation of the program.
Sometimes the driver needs to insert additional shader code in order to simulate hardware functionality. In this case the shader code may be generated based on specific state information, and if so the driver needs to recompile the shader whenever that state information changes rather than just once at startup.
Inserting driver-specific shader code is not a broadly useful approach because a number of apps and emulators assume (reasonably) that they have access to all of the registers and instruction slots, but if the driver inserts additional shader code (using registers and instruction slots) then Bad Things will happen. In principle the driver could react by disabling whatever hardware feature is being emulated, but that would require something like a GL_ARB_DudeImUsingAllTheResources OpenGL call amd AFAIK there is no such function defined today.Test signature
Comment
Comment