Announcement

Collapse
No announcement yet.

help with TGSI part 2(Marek ;) ), i know im annoying but im getting real close

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with TGSI part 2(Marek ;) ), i know im annoying but im getting real close

    ok i advanced a big step in the shader code but i have a little doubt marek.

    my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:

    1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
    2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
    3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
    4.) im absoletely wrong and i should drop dead somewhere? XD

    btw this is correct?

    static const char shader1_asm[] =
    "FRAG\n"
    "DCL OUT[0], POSITION\n"
    "DCL OUT2[0], POSITION\n"
    "DCL TEMP[0]\n"
    "DCL CONST[0..3]\n"
    "0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
    "1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
    "2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
    "3: DP4 TEMP[0].w, IN[0], CONST[3]\n"

    this is correct?
    "4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
    "5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
    "6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
    "7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";

    i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid

    thx a lot for your help and sorry for bother you this much XD

  • #2
    ok i made a booboo writng the code plase replace mentally TEMP[] for OUT[], my bad

    Comment


    • #3
      and in the last example i missed to ask, since i need the result of the first 2 matrix multiplication to multiply it for third i can use OUT[] itself or i should MOV to create a copy of the result for OUT2[]?

      damn edit time lol

      Comment


      • #4
        I don't follow these forums often. Feel free to drop in on #dri-devel @ irc.freenode and ask there (the channel is dedicated to developers only).

        Originally posted by jrch2k8 View Post
        ok i advanced a big step in the shader code but i have a little doubt marek.

        my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:

        1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
        I don't understand the question.

        Originally posted by jrch2k8 View Post
        2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
        I don't understand the question. Think about Gallium like it was OpenGL and use the same techniques and approach like you would in OpenGL. You should really know how to implement your algorithm using OpenGL before going Gallium. If you can't do it with the former, you can't do it with the latter as well because they are conceptually the same thing.

        Originally posted by jrch2k8 View Post
        3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
        ARL is used for non-constant indexing. You first load an index using ARL to an address register and then use the register for indexing, example:

        VERT
        DCL IN[0]
        DCL OUT[0], POSITION
        DCL CONST[0..7]
        DCL ADDR[0]
        0: ARL ADDR[0].x, IN[0].x
        1: MOV OUT[0], CONST[ADDR[0].x+0]
        2: END

        Originally posted by jrch2k8 View Post
        4.) im absoletely wrong and i should drop dead somewhere? XD

        btw this is correct?

        static const char shader1_asm[] =
        "FRAG\n"
        "DCL OUT[0], POSITION\n"
        "DCL OUT2[0], POSITION\n"
        "DCL TEMP[0]\n"
        "DCL CONST[0..3]\n"
        "0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
        "1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
        "2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
        "3: DP4 TEMP[0].w, IN[0], CONST[3]\n"

        this is correct?
        "4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
        "5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
        "6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
        "7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";

        i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid
        1) Use OUT[1] instead of OUT2[0].
        2) Two outputs cannot have the POSITION semantics for the same reason you cannot have two position outputs in GLSL (there's only one gl_Position). Consider using either COLOR[0] or GENERIC[0] for the second output. Also consider using a fragment shader instead if you play to do any kind of image processing.

        Originally posted by jrch2k8 View Post
        and in the last example i missed to ask, since i need the result of the first 2 matrix multiplication to multiply it for third i can use OUT[] itself or i should MOV to create a copy of the result for OUT2[]?
        I don't understand the question.

        If you use a Gallium driver, set the environment variable ST_DEBUG=tgsi and run any game or 3D application. It will print source code of all shaders in TGSI to stderr.

        Comment

        Working...
        X