Announcement

Collapse
No announcement yet.

How Valve Made L4D2 Faster On Linux Than Windows

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Originally posted by elanthis View Post
    For example, in a graphics API abstraction layer I am using now, there is a header with non-virtualized class definitions and nothing is inlined. There are then multiple sets of .cpp files, e.g. GfxWin32D3D11*.cpp, GfxDarwinGL*.cpp, GfxiOSGLES*.cpp, etc. The compiler inlines the smaller functions in release builds thanks to LTO and everything else is just a regular function call. Sure, there are platforms that support multiple APIs, but that is very rarely worth even caring about. Each platform has a primary well-supported API which most users' hardware is compatible with, so just use that. And if you're a small-time indie developer, just write for GL ES and ifdef the few bits that need to change to run on regular GL. You probably don't have the time and money to write a high-end fully-featured D3D11 renderer, a D3D9 renderer, an GL3/4 Core renderer, a GL 2.1 renderer, a GLES1 renderer, and a GLES2 renderer... it's not just the API differences, but all the shaders, art content enhancements, testing and performance work, etc. As an indie dev, you'll be making stylized but simplistic graphics, so a single least-common-denominator API is preferable. If you're writing a big engine like Unity or Unreal or whatnot, well... you're going to have a LOT of problems to solve besides the easy stuff like abstracting the "create vertex buffer" API call efficiently.
    I'm somewhat doing a similar thing supporting ES 2.0 on Horde, I just have different .cpp files and Horde has a nice global static class 'gRDI' which can be common enough if someone wanted to add DX9 or DX10 support. There's only GL 2.1+extensions and GL ES 2.0+extensions support so far, but by the looks of it all the missing pieces from ES 2.0 that the 2.1+extensions backend supports is now in ES 3.0 so it's an easy addition which I'll get around to eventually. I'm probably considered an 'indie' developer so I don't really need anymore backends to support, on the asset-side I have Maya setting up profiles per-platform which just have tweaked GLSL shaders and instructions to convert to certain compressed formats in .ktx format where possible (eg. Android+Tegra2/3 would output S3TC in .ktx, iOS PVRTC in .ktx, etc.)

    Getting back on-topic, I think converting gigabytes of assets in L4D2 and other source games sounds way too excessive, I'd bet they would just do a small conversion in the shader or at load time, given the load times to flip textures before hitting OpenGL aren't too long... I guess those shaders would just use MojoShader to convert HLSL to GLSL, or they just went with Cg *shrugs.*

    Comment


    • #92
      Originally posted by elanthis View Post
      ... That's my line.



      I'm not sure what you're trying to say (there's clearly a language barrier issue here), but it still sounds like you're possibly confused on how this all works. A GLSL compiler generally produces some kind of intermediary format which is then fed to a codegen pass that does generate real "to the metal" machine code for the GPU execution cores (has to happen somewhere, after all). Some drivers share this intermediary format with their HLSL compiler (which explicitly generates a Microsoft-defined cross-IHV intermediary format, unlike GLSL), others do not. In either case, all shader languages at some point are compiled to raw machine code, but that code is not specified by any API or language standard, because it varies not only per-vendor but even per-product-cycle, and the APIs are intended to work on all hardware of the appropriate generation. Hence why OpenGL mandates GLSL source code as the lowest level in the standard (NVIDIA defines their assembly program extension, but that itself is just another intermediary format) and why D3D mandates its intermediary format as the lowest level in the standard (basically the same general concept as NVIDIA's GL assembly extension, but part of the API specification rather than as a vendor add-on).



      Most game developers certainly know that you can have good OOP design _without_ excessive subclassing or virtual functions. The wonderful thing about C++ is that it makes static polymorphism almost as easy as dynamic polymorphism, so you can write a compile-time abstraction layer (without nasty #ifdefs) that is still good OOP. Even at the C level, you can write a single API with multiple backends by simply compiling in different translation units that implement the API in different fashions.

      For example, in a graphics API abstraction layer I am using now, there is a header with non-virtualized class definitions and nothing is inlined. There are then multiple sets of .cpp files, e.g. GfxWin32D3D11*.cpp, GfxDarwinGL*.cpp, GfxiOSGLES*.cpp, etc. The compiler inlines the smaller functions in release builds thanks to LTO and everything else is just a regular function call. Sure, there are platforms that support multiple APIs, but that is very rarely worth even caring about. Each platform has a primary well-supported API which most users' hardware is compatible with, so just use that. And if you're a small-time indie developer, just write for GL ES and ifdef the few bits that need to change to run on regular GL. You probably don't have the time and money to write a high-end fully-featured D3D11 renderer, a D3D9 renderer, an GL3/4 Core renderer, a GL 2.1 renderer, a GLES1 renderer, and a GLES2 renderer... it's not just the API differences, but all the shaders, art content enhancements, testing and performance work, etc. As an indie dev, you'll be making stylized but simplistic graphics, so a single least-common-denominator API is preferable. If you're writing a big engine like Unity or Unreal or whatnot, well... you're going to have a LOT of problems to solve besides the easy stuff like abstracting the "create vertex buffer" API call efficiently.


      Wrong, wrong, wrong and wrong.

      1. Compilers don't have access to-the-metal on gpu.

      2. There is not need in any point code to be compiled to-the-metal. The GPU hardware (not the driver) understands an assembly-level instruction (like MAD, or LOG, or MUL, or TEX, or anything) and does the execution lower with smaller instructions (atomic operations and others).

      3. Byte-code of GLSL, HLSL and CG varies per product, but the VM-typed protocol (that all GLSL games are written) is the same. Same goes for OpenCL to.

      Comment


      • #93
        Part of the confusion here may come from the fact that with GPUs you end up running into a lot of compilers -- the one used to compile the application code from C/C++ or whatever to CPU hardware instructions (while still containing high level graphics operations) vs the one in the driver stack used to compile from the high level graphics operations (GLSL, HLSL etc..) to GPU instructions.

        In this case I suspect you may be talking about different compilers. The second one *definitely* goes to-the-metal.
        Test signature

        Comment


        • #94
          Originally posted by bridgman View Post
          Part of the confusion here may come from the fact that with GPUs you end up running into a lot of compilers -- the one used to compile the application code from C/C++ or whatever to CPU hardware instructions (while still containing high level graphics operations) vs the one in the driver stack used to compile from the high level graphics operations (GLSL, HLSL etc..) to GPU instructions.

          In this case I suspect you may be talking about different compilers. The second one *definitely* goes to-the-metal.

          The second one and any other doesn't go to-the-metal. There is not to the metal access for GPUs, not even by the driver it self. Also wile GPU has an Instruction-Set, doesn't have a Native_Machine_Languadge. As we say x86 for a CPU, we cannot say nv64 for a GPU. Thats because a GPU missing varius control and execution units that only a CPU has (CISC or RISC). A GPU has ofload more things on Sofware. You never have to-the-metal access, you can't compile something only for the GPU, you always need a CPU and a driver, and always you write and compile to VM (like OpenCL).

          Comment


          • #95
            Um... no. Take a look at the EXA code for pre-SI radeon chips -- it uses to-the-metal GPU programs (aka shaders). They happen to be hard-coded rather than stored compiler output for the simple reason that we historically got 2D driver support running before 3D (and the radeon shader compiler happens to be in the 3D driver) but that sequence is changing with SI anyways.

            A number of GL and CL implementations allow offline storage of compiled shader programs at the GPU binary level. There is a need for the driver to set appropriate state info and issue appropriate draw/compute commands but that is conceptually no different from an OS (CPU) scheduler passing control from the kernel to a user process.

            What GPUs don't generally have & use today is support for GPU programs which "run forever" but which can be pre-empted (pulled off the hardware) to allow other programs to run for a while, but that is quite different from what you are talking about.

            The bigger issue here (which you are correctly identifying but IMO not describing correctly) is that GPU instruction sets are allowed to change more quickly than CPU instruction sets, as a consequence of having standards at a higher level than for CPUs (eg OpenGL / DirectX vs x86 ISA). There is a strong convenience aspect associated with having applications program GPUs via higher level API instead of programming to-the-metal either directly (by having application code include GPU hardware instructions) or indirectly (by having the toolchain compile high level GPU operations in the application to GPU machine instructions in the binary), but that is a "easier for users if you don't" constraint rather than a "you can't do it" one.
            Last edited by bridgman; 20 August 2012, 10:30 AM.
            Test signature

            Comment


            • #96
              Originally posted by bridgman View Post
              Um... no. Take a look at the EXA code for pre-SI radeon chips -- it uses to-the-metal GPU programs (aka shaders). They happen to be hard-coded rather than stored compiler output for the simple reason that we historically got 2D driver support running before 3D (and the radeon shader compiler happens to be in the 3D driver) but that sequence is changing with SI anyways.

              A number of GL and CL implementations allow offline storage of compiled shader programs at the GPU binary level. There is a need for the driver to set appropriate state info and issue appropriate draw/compute commands but that is conceptually no different from an OS (CPU) scheduler passing control from the kernel to a user process.

              What GPUs don't generally have & use today is support for GPU programs which "run forever" but which can be pre-empted (pulled off the hardware) to allow other programs to run for a while, but that is quite different from what you are talking about.

              The bigger issue here (which you are correctly identifying but IMO not describing correctly) is that GPU instruction sets are allowed to change more quickly than CPU instruction sets, as a consequence of having standards at a higher level than for CPUs (eg OpenGL / DirectX vs x86 ISA). There is a strong convenience aspect associated with having applications program GPUs via higher level API instead of programming to-the-metal either directly (by having application code include GPU hardware instructions) or indirectly (by having the toolchain compile high level GPU operations in the application to GPU machine instructions in the binary), but that is a "easier for users if you don't" constraint rather than a "you can't do it" one.

              Shaders are not compiled to-the-metal. Shaders are pre-cemi-compiled to VM (thats the known GLSL), and not like sources of a general program. Then comes the "target-er" and the "optimizer" also known as compiler. The GLSL compiler compiles to assembly-level (MAD, MUL, TEX, LOG, FRC, LIT, and other assembly-level commands). Then the GPU hardware execute them internally with smaller instructions, atomic operations, possibly micro-instructions (if the GPU has microcode), and others. The compiler doesn't have access to the entire instruction set like you have on a CPU with a software-rasterizer.

              Comment


              • #97
                Even if Nvidia gives you access on the entire instruction-set you still can't use it with any compiler or anything else. Thats because GPUs missing various control and execution units that only a CPU has. For a GPU, programs must be pre-controled, and thats a higher level sub-set.

                Comment


                • #98
                  Originally posted by artivision View Post
                  The GLSL compiler compiles to assembly-level (MAD, MUL, TEX, LOG, FRC, LIT, and other assembly-level commands). Then the GPU hardware execute them internally with smaller instructions, atomic operations, possibly micro-instructions (if the GPU has microcode), and others. The compiler doesn't have access to the entire instruction set like you have on a CPU with a software-rasterizer.
                  If you're saying "the GPU hardware may execute hardware instructions by breaking them down internally to even simpler operations, in the same way that modern x86 CPUs execute x86 instructions by breaking them down to simpler operations" then I guess I agree. That doesn't mean there is an internal instruction set you could potentially program in though.

                  Originally posted by artivision View Post
                  Even if Nvidia gives you access on the entire instruction-set you still can't use it with any compiler or anything else. Thats because GPUs missing various control and execution units that only a CPU has. For a GPU, programs must be pre-controled, and thats a higher level sub-set.
                  I don't work for NVidia

                  That said, I think you'll find that both NVidia and AMD GPUs today are more capable than what you are describing. I know that SI is, and I imagine the corresponding NVidia parts aren't too different.
                  Test signature

                  Comment


                  • #99
                    Again, the key point here is that GPU vendors can potentially deliver larger performance gains if they're not locked into a fixed instruction set, so there is a strong bias towards maintaining portability by not programming to the (constantly changing) hardware ISA.

                    That doesn't mean you can't, just that it's not necessarily a good idea in many cases.
                    Test signature

                    Comment


                    • Originally posted by bridgman View Post
                      Again, the key point here is that GPU vendors can potentially deliver larger performance gains if they're not locked into a fixed instruction set, so there is a strong bias towards maintaining portability by not programming to the (constantly changing) hardware ISA.

                      That doesn't mean you can't, just that it's not necessarily a good idea in many cases.

                      If there was that possibility then where is a GCC target for Nvidia or AMD GPUs (for C++ I mean)? Any way to be fair, I read the Nvidia's Docs and Kepler has some Atomic-operations exposure. Probably newer GPU hardware can run even part of the compiler on shaders. Any way Fusion its the next state for GPUs, but not there yet.

                      Comment

                      Working...
                      X