Announcement

Collapse
No announcement yet.

Marek Posts Mesa Tessellation Support For RadeonSI Gallium3D

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Serafean View Post
    r600 no, but some generations supported by the r600g driver do.
    *cough* GL_AMDX_vertex_shader_tessellator *cough*

    Comment


    • #32
      Originally posted by haagch View Post
      Only shader subroutine. Dave's implementation works with it.
      Too bad it's only a "conformance" implementation with no regard to performance Even though the hardware has support for real indirect jumps, Mesa uses if-branching for the implementation. But oh well, can't really blame them, subroutine is a huge feature to properly support and the graphics world has for some reason decided that they're not worth using over traditional uniform branching.

      Originally posted by haagch View Post
      And mod(int,int), so a few shaders fail to compile. But it looks mostly okay: https://www.youtube.com/watch?v=e97hP1ys-7s
      https://www.reddit.com/r/linux_gamin...hd_7970m_very/
      This one I don't quite understand. The GLSL docs say "genType mod(genType, genType)" is a possible signature. Does "genType" only refer to floats and float-vectors? I would have assumed it covers integers as well..

      Originally posted by haagch View Post
      839 shaders in metro 2033 redux use it, but it kinda looks like it is autogenerated. They probably used a shader translator from the direct3d shaders to opengl that uses it.
      Well, it appears that D3D11 does have subroutines too, so I wonder if they've been 1to1 translated to GLSL or whether the translator writers had a more novel use for them.

      Comment


      • #33
        I really want to see Metro Redux benchmarks with mesa soon. Also Heaven will be a valid benchmark to compare binary to OSS soon, right now it is useless to compare OpenGL 4 to 3. Well done AMD! I still miss HDMI 2.0(a) with latest cards and more video codecs but this is a nice step to retire fglrx...
        Last edited by Kano; 18 June 2015, 07:40 AM.

        Comment


        • #34
          Originally posted by Ancurio View Post
          This one I don't quite understand. The GLSL docs say "genType mod(genType, genType)" is a possible signature. Does "genType" only refer to floats and float-vectors? I would have assumed it covers integers as well..
          It was a bug. It should be coerced to float, but wasn't. Interestingly it worked with glsl 150, but not with 400. The problem was that the code didn't decide between the double and the float coercion and instead kept it as int, which was an error. The double version got added with 400... Here's the patch that makes it work: http://patchwork.freedesktop.org/patch/52136/

          Comment


          • #35
            Originally posted by Ancurio View Post
            This one I don't quite understand. The GLSL docs say "genType mod(genType, genType)" is a possible signature. Does "genType" only refer to floats and float-vectors? I would have assumed it covers integers as well..
            genType is float/vec2/vec3/vec4. genIType is int/ivec2/ivec3/ivec4, genUType is uint/uvec2/uvec3/uvec4, and genDType is double/dvec2/dvec3/dvec4. Now, in unextended GLSL 150, you just have mod(float, float) which matches something like mod(1, 2). There are implicit conversion rules which allow you to convert int to float, and call the float version of the function. If you were to only enable ARB_fp64 on top of that, you'd gain mod(double, double). AFAIK there is no way for the GLSL compiler to figure out whether to pick the float or double version. However with ARB_gs5, you get rules to order the various functions based on "how well" they match, and so you're able to pick the float version again. With #version 400, both fp64 and gs5 are enabled, however gs5 had been added first and the fp64 patches didn't update the overload resolution logic.

            As an aside on subroutines, indirect jumps aren't exactly a good idea, I believe the conclusion overall is that they perform worse than just doing the if/else thing. And it's also a huge complication for an almost-never-used feature.

            Comment


            • #36
              Originally posted by imirkin View Post
              As an aside on subroutines, indirect jumps aren't exactly a good idea, I believe the conclusion overall is that they perform worse than just doing the if/else thing. And it's also a huge complication for an almost-never-used feature.
              This is so hard for me to grok. How could it possibly be that expert hardware teams developed circuits and the Khronos committee sat down and wrote a long ass specification for a feature that later turned out dead on arrival? Are these things just done without extensive tests being performed during R&D?

              Comment


              • #37
                Originally posted by Ancurio View Post

                This is so hard for me to grok. How could it possibly be that expert hardware teams developed circuits and the Khronos committee sat down and wrote a long ass specification for a feature that later turned out dead on arrival? Are these things just done without extensive tests being performed during R&D?
                The simplest argument against it that I'm aware of is that it just creates ridiculous problems for register allocators. With a regular CPU, you have caller-saved and callee-saved registers (by convention), which is implemented by pushing things onto a stack and then popping them off. There aren't that many registers either. This is relatively fast, and it's not too much memory bandwidth since it's only a handful of cores. GPU's have local per-thread memory for such manipulations, but it's a lot slower than registers (duh) of which there are a ton (GK110 and newer GPUs have 256 32-bit registers, for example). Also there are no dedicated ops for stack push/pop, so things have to be explicitly allocated or, alternatively, use (slower) indirect accesses. But if you want to keep things in subroutines, that means you have to carve out register space for the subroutines. But then you don't get to use as many registers as efficiently (number of registers used will often affect the number of parallel threads).

                Generally the number of subroutines is small, and the amount of code in them is small, so just inlining everything isn't such a bad policy.

                Comment

                Working...
                X