Former AMD Developer: OpenGL Is Broken

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kraut
    replied
    Originally posted by efikkan View Post
    OpenGL is not broken, and if AMD have anything to complain about it's their implementation.

    But, OpenGL is outdated, the same way Direct3D, OpenCL, Mantle and Metal are outdated and old style APIs. They all work by sending thousands of small requests from the CPU to the GPU. Reducing the API overhead only helps performing some more API calls, but still doesn't allow us to utilize the GPUs efficiently.

    What we really need is a lowlevel universal GPU programming language where we can implement the graphics pipeline ourselves, or do compute. 6-7 rigid shader types in a pipeline accessing predefined data is inefficient. "Bindless graphics" extensions with pointers and customized data structures is a step in the right direction. Features in CUDA like controlling threads, transferring data from the GPU, etc. are getting close.

    With a low level language, driver development will be easier, and anyone can create their framework on top of it. Heck, even Apple can create their own "Metal" on top of that...
    The APIs don't specify how exactly the commands are send to the GPUs.
    Most OpenGL/Direct3D implementations collect your drawcalls and other commands. After some accumulation they send this package to a driver thread. This driver thread accumulates, creates and sends date packages to the GPU command queues.
    There is a lot of validation going on in between, that eats a lot of CPU time.

    We do utilize the GPUs, but with old style APIs we wast a lot of CPU power validating and not utilizing multi cores effectively at the same time. Also there is a lot of latency that does not need to exist.

    Mantle actually lets you fill the command queues directly. With a thin abstraction of course. Same goes for the DMA copy queue.
    There are no driver threads and mantle only runs in application thread(s) you called it from.
    There isn't much information about mantle out there I guess and it sure comes/will come with disadvantages. But calling it old style API makes you look like somebody not doing his homework.

    Leave a comment:


  • efikkan
    replied
    OpenGL is not broken, and if AMD have anything to complain about it's their implementation.

    But, OpenGL is outdated, the same way Direct3D, OpenCL, Mantle and Metal are outdated and old style APIs. They all work by sending thousands of small requests from the CPU to the GPU. Reducing the API overhead only helps performing some more API calls, but still doesn't allow us to utilize the GPUs efficiently.

    What we really need is a lowlevel universal GPU programming language where we can implement the graphics pipeline ourselves, or do compute. 6-7 rigid shader types in a pipeline accessing predefined data is inefficient. "Bindless graphics" extensions with pointers and customized data structures is a step in the right direction. Features in CUDA like controlling threads, transferring data from the GPU, etc. are getting close.

    With a low level language, driver development will be easier, and anyone can create their framework on top of it. Heck, even Apple can create their own "Metal" on top of that...

    Leave a comment:


  • smitty3268
    replied
    Originally posted by Kraut View Post
    Most smaller changes won't need a rebuilding of the shader binarys. Also if we ever end up with matured OSS drivers the frequency of changes that need rebuilding will drop dramatically.
    And I don't think YOUR driver updating behavior is a good argument against shader binarys.
    The mesa implementation checks the git SHA1 tag and will automatically clear out anything that was cached against another version, no matter how small and insignificant the change was.

    The reason is that it's hard to tell if a change is important or not without someone carefully reviewing what has been changed, and no one wants to do that for every single commit that goes into Mesa. So they just always automatically wipe it out even when it's not necessary.

    That the cost of parsing GLSL is insignificant. I like to quote one of the AMD developer on that, but can't find his comment anymore.
    The cost is implementation specific. The mesa drivers are fairly slow at compiling. Some drivers are very slow with certain features that may be in your shaders, and very fast otherwise. That's part of the problem, you can't always assume that your shader compile will be fast, without testing it out on multiple hardware and driver versions.
    Last edited by smitty3268; 07 June 2014, 08:13 PM.

    Leave a comment:


  • Kraut
    replied
    Originally posted by mdias View Post
    Driver updates aren't (shouldn't be?) that uncommon. Plus, on linux side you can have new daily packages like I do. If you're shipping a AAA game, chances are every time you need to recompile all shaders, you're gonna be a nuisance to the user.
    Most smaller changes won't need a rebuilding of the shader binarys. Also if we ever end up with matured OSS drivers the frequency of changes that need rebuilding will drop dramatically.
    And I don't think YOUR driver updating behavior is a good argument against shader binarys.

    Originally posted by mdias View Post
    The thing is you can have the GLSL parsing done and some optimization passes done too on the bytecode. Only hardware specific optimizations are left to be made, which I doubt is the most expensive step. Then again it depends on the optimizer aggressiveness. In any case, there's no single disadvantage to having the bytecode shaders, and there are several advantages to it.
    Lets go to the core points of the argument. How much does it cost to:
    -parsing GLSL
    -non Hardware related optimization
    -Hardware related optimization

    That the cost of parsing GLSL is insignificant. I like to quote one of the AMD developer on that, but can't find his comment anymore.

    Most of the non GPU related optimizations can be done on GLSL itself. On mobile platforms and esp. with HLSL to GLSL converters this seems to be a big problem. The mobile compilers don't optimize as much presumably because of weaker CPUs, power usage and pure incompetence on the side of hardware manufactures at software development. The last point is actually the best argument for a shader bytecode I can think of.
    There are already Projects that use the mesa parser/optimizer to tackle this issue: http://www.ohloh.net/p/glsl-optimizer

    Saying bytecode comes only with advantages is naive.
    If you create a new GPU API (seems to be fashionable right now) you could go with bytecode only shaders and I would be fine with it.
    Integrating bytecode in OpenGL would bloat it even more. GLSL will never go away for obvious backwards compatibility reasons and the vendors get one more thing to fuck up at implementing.

    Even if I don't have hard numbers on the factor how much non Hardware optimization VS Hardware optimization needs in performance. I suspect it will be always more for Hardware optimization. Just alone considering how different the GPU architectures are.
    So you most likely still want to use binarys to speed up the loading.

    Leave a comment:


  • zanny
    replied
    Driver updates aren't (shouldn't be?) that uncommon. Plus, on linux side you can have new daily packages like I do. If you're shipping a AAA game, chances are every time you need to recompile all shaders, you're gonna be a nuisance to the user.
    You are a very rare exception. Even on Arch, I don't see new Mesa releases more than once or twice a month. The binary drivers see new version releases like every 3 months, unless a game comes out requiring patches. And even then, a lot of games don't cache shaders at all. They run like ass, but the average joe really doesn't tell the difference between their shaders getting recompiled all the time and just badly optimized render code.

    Leave a comment:


  • Rallos Zek
    replied
    Originally posted by mdias View Post
    fact: nVidia blob drivers are more relaxed when following the standard + game developers are lazy to follow the GL standard, therefore many applications work fine on nVidia, but not on AMD, and then people complain that it's AMD's fault...
    Yep there a few games I had to file bug reports for because they did not write to spec and so many bugs showed up while playing on AMD cards. And most of the bugs were from AMD/ATi users because the games were written for Nvidias broken OpenGL hacks not OpenGL api. So yeah most of the time its not AMD's fault games and some other programs are buggy and do not work, its lazy developers not writing code to OpenGL api specs.

    Leave a comment:


  • mdias
    replied
    Originally posted by curaga View Post
    r600sb is pretty much all hw-specific, and enabling it made shader compiles several times slower (5-10x).
    While I must say I'm surprised by those numbers, time deltas of each compilation step would be much more informative and comparable.
    I don't know enough about r600sb to comment further, but maybe it's initial aim was to optimize the shader output and not the compilation process itself?

    In any case compiling from GLSL all the way to hw specific binary instructions will always be slower than starting from optimized bytecode. I know it seems irrelevant when thinking about 1 or 2 shaders, but AAA games may have thousands.

    Leave a comment:


  • curaga
    replied
    Originally posted by mdias View Post
    The thing is you can have the GLSL parsing done and some optimization passes done too on the bytecode. Only hardware specific optimizations are left to be made, which I doubt is the most expensive step. Then again it depends on the optimizer aggressiveness. In any case, there's no single disadvantage to having the bytecode shaders, and there are several advantages to it.
    r600sb is pretty much all hw-specific, and enabling it made shader compiles several times slower (5-10x).

    Leave a comment:


  • mdias
    replied
    Originally posted by Kraut View Post
    The file size of a shader or the shader binary doesn't matter. Most likely a single texture will be bigger then all your shaders together.
    You don't stream shaders! Well stupid(IMO) drivers often do compile on first use only and then continue to bake a more optimized version in the background to switch to that later. But that is something you actually can prevent with ARB_get_program_binary. Because it forces compiling so it actually can give you the binary.
    Sorry, I wasn't specific enough. I meant streaming from a remote server; think MMO. As for the size question, it's true that it isn't a strong point, but spared space is always nice. There's more to it than disk space (CPU cache etc), but I agree it's not very important.

    Originally posted by Kraut View Post
    And I don't get it why you are so fearful of recompiling shaders after a driver update? Its not like amd/nvidia release drivers every second day.
    Driver updates aren't (shouldn't be?) that uncommon. Plus, on linux side you can have new daily packages like I do. If you're shipping a AAA game, chances are every time you need to recompile all shaders, you're gonna be a nuisance to the user.


    Originally posted by Kraut View Post
    Well I can understand that some developers want it to hide there source code. But we should be honest here and say it is an obfuscate feature and not a technical one.
    I disagree. Obfuscation is just a nice side effect...

    Originally posted by Kraut View Post
    The GLSL parsing is actually not expensive, its the optimization that eats a lot of time. And most of the optimization has to do with the underlying GPU architecture.

    I suspect that in many AAA D3D games you have way less of compiling time just because GPU vendors actually deliver highly optimized shader binarys with there drivers.
    The thing is you can have the GLSL parsing done and some optimization passes done too on the bytecode. Only hardware specific optimizations are left to be made, which I doubt is the most expensive step. Then again it depends on the optimizer aggressiveness. In any case, there's no single disadvantage to having the bytecode shaders, and there are several advantages to it.

    Leave a comment:


  • Kraut
    replied
    Originally posted by mdias View Post
    Storage space is not the problem, it's the time it takes to read it. Sure, it's just text, but it would be smaller if it were bytecode.

    It is a lot more complicated than an if block... You will need to check if the user switched graphics card, or is running your software with another GPU. What if a new OpenGL implementation/patch was installed and generates different (faster? bug-fix?) binaries? Maybe you need to check the specific version of the driver now too... Which means; are you willing to make your user wait for another round of shader compiles every time he upgrades his drivers? What about realtime streaming content?
    The file size of a shader or the shader binary doesn't matter. Most likely a single texture will be bigger then all your shaders together.
    You don't stream shaders! Well stupid(IMO) drivers often do compile on first use only and then continue to bake a more optimized version in the background to switch to that later. But that is something you actually can prevent with ARB_get_program_binary. Because it forces compiling so it actually can give you the binary.
    And with guarantied compiling its easy to run it in the background now. E.g. start compiling after the main menu of a game loaded!

    ARB_get_program_binary was designed to just tell you if it doesn't like the binary. You just check a boolean.
    Its trivial to implement:
    Code:
    if (!(haveBinaryFileForShader() && tryToUseBinaryFileInThisGLContext()))
    {
     compileShaderFilesAndSaveBinary()
    }
    And I don't get it why you are so fearful of recompiling shaders after a driver update? Its not like amd/nvidia release drivers every second day.

    Originally posted by mdias View Post
    Sure, you can also decompile .NET code. That doesn't mean people prefer to ship the source instead of the bytecode "executables". Plus, it will make it harder to read the code.

    Until then no one can say it's not a problem, because it is. Khronos membership is very expensive, I would suppose they have enough resouces to create an hardware independent bytecode format. It's not that hard...
    Well I can understand that some developers want it to hide there source code. But we should be honest here and say it is an obfuscate feature and not a technical one.
    The GLSL parsing is actually not expensive, its the optimization that eats a lot of time. And most of the optimization has to do with the underlying GPU architecture.

    I suspect that in many AAA D3D games you have way less of compiling time just because GPU vendors actually deliver highly optimized shader binarys with there drivers.

    Leave a comment:

Working...
X