ok i advanced a big step in the shader code but i have a little doubt marek.
my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:
1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
4.) im absoletely wrong and i should drop dead somewhere? XD
btw this is correct?
static const char shader1_asm[] =
"FRAG\n"
"DCL OUT[0], POSITION\n"
"DCL OUT2[0], POSITION\n"
"DCL TEMP[0]\n"
"DCL CONST[0..3]\n"
"0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
"1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
"2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
"3: DP4 TEMP[0].w, IN[0], CONST[3]\n"
this is correct?
"4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
"5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
"6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
"7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";
i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid
thx a lot for your help and sorry for bother you this much XD
my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:
1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
4.) im absoletely wrong and i should drop dead somewhere? XD
btw this is correct?
static const char shader1_asm[] =
"FRAG\n"
"DCL OUT[0], POSITION\n"
"DCL OUT2[0], POSITION\n"
"DCL TEMP[0]\n"
"DCL CONST[0..3]\n"
"0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
"1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
"2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
"3: DP4 TEMP[0].w, IN[0], CONST[3]\n"
this is correct?
"4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
"5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
"6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
"7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";
i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid
thx a lot for your help and sorry for bother you this much XD
Comment