Announcement

Collapse
No announcement yet.

Linux 2.6.36-rc5 Kernel Released; Fixes 14 Year Old Bug

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    To elaborate, what you call AOT is actually JIT: the code is compiled by the driver just before execution. AOT would mean distributing a precompiled binary shader - this *is* possible on D3D but the binary is *still* JITed by the driver.

    Comment


    • #47
      Originally posted by cb88 View Post
      @V!NCENT you forget branch prediction.... as the code is running it can be optimised for the most common branch taken... where as that is pretty much impossible with AOT unless you generate a profile even then that can change.
      Thanks for the reply, didn't think about branch prediction So, GPUs expose some statistics on branch mispredictions?

      Comment


      • #48
        @Qaridarium don't worry too much about your english it always fails anyway whether you realize it or not.

        I don't think you understand RISC and VLIW well either... the point of RISC is not speed through minimalism but rather speed through tailoring your cpu design to the code you intend to execute. In any case VLIW will not succeed due to the difficulty of building a good compiler for it... RISC will always win there. Take that with a grain of salt though as RISC may well become more VLIW like as the compilers improve... remmber RISC isn't about minimalism its about the most optimised/balanced design.

        Comment


        • #49
          Originally posted by BlackStar View Post
          To elaborate, what you call AOT is actually JIT: the code is compiled by the driver just before execution. AOT would mean distributing a precompiled binary shader - this *is* possible on D3D but the binary is *still* JITed by the driver.
          Yeah, looks like I've misused the terms. I always thought that JIT is called 'on-demand' in cases, where it is possible to recompile the code in the more efficient way (or compile it for the first time if it was run only on the interpreter before).

          The real question was, after shader code is compiled the first time by the driver (and I do mean really compiled, not transformed in some sort of intermediate representation like bytecode) and uploaded into videocard RAM, does it really have to be changed later, probable based on some statistics, reported by the GPU (cache misses, branch mispredictions etc.)

          And btw, do modern GPUs predict branches at all, or do they imply some simple logic optimized for loops like "branch always goes back"?

          Comment


          • #50
            Originally posted by OlegOlegovich View Post
            Yeah, looks like I've misused the terms. I always thought that JIT is called 'on-demand' in cases, where it is possible to recompile the code in the more efficient way (or compile it for the first time if it was run only on the interpreter before).

            The real question was, after shader code is compiled the first time by the driver (and I do mean really compiled, not transformed in some sort of intermediate representation like bytecode) and uploaded into videocard RAM, does it really have to be changed later, probable based on some statistics, reported by the GPU (cache misses, branch mispredictions etc.)
            Older Nvidia drivers would recompile whenever you changed uniforms to/from 0.0 (and some other specific numbers). While this "optimization" presumably led to faster execution, it also introduced untold headaches to developers (change a random parameter, get a 30ms stall waiting for the shader to recompile) so it was disabled at some point.



            And btw, do modern GPUs predict branches at all, or do they imply some simple logic optimized for loops like "branch always goes back"?
            Yes, modern GPUs predict branches.

            Comment


            • #51
              Originally posted by BlackStar View Post
              Older Nvidia drivers would recompile whenever you changed uniforms to/from 0.0 (and some other specific numbers). While this "optimization" presumably led to faster execution, it also introduced untold headaches to developers (change a random parameter, get a 30ms stall waiting for the shader to recompile) so it was disabled at some point.





              Yes, modern GPUs predict branches.
              Thank you!

              Comment


              • #52
                OK, glad this all worked out while I was sleeping

                "AOT when it goes through the driver" (as opposed to "AOT before the app is distributed") is what I was calling JIT. Most apps prepare their shader programs during startup so that they are JIT-compiled once per invocation of the program.

                Sometimes the driver needs to insert additional shader code in order to simulate hardware functionality. In this case the shader code may be generated based on specific state information, and if so the driver needs to recompile the shader whenever that state information changes rather than just once at startup.

                Inserting driver-specific shader code is not a broadly useful approach because a number of apps and emulators assume (reasonably) that they have access to all of the registers and instruction slots, but if the driver inserts additional shader code (using registers and instruction slots) then Bad Things will happen. In principle the driver could react by disabling whatever hardware feature is being emulated, but that would require something like a GL_ARB_DudeImUsingAllTheResources OpenGL call amd AFAIK there is no such function defined today.

                Comment


                • #53
                  EDIT - there may actually be enough info in the existing API calls to detect when inserting shader code is not possible, need to check.

                  Comment

                  Working...
                  X