More improvements
I am itching to give my 2 cents here, as there is even more room for improvements, despite being quite minor compared to your initial proposal (Note: The CODE-Tag can be handy):
Sadly, the order of the for-loop is determined. Looking at "int input", but not knowing its data-range, one might also be able to reduce its size accordingly.
More improvements would actually require to know more about the specific project (I would be happy if you could PM me the name of the project; I really would like to know (an hope it's not Intel)).
Did you commit your changes?
Best regards
FRIGN
Originally posted by Obscene_CNN
View Post
Code:
static [B](u)int_fastQ_t[/B] somechip_interp_flat(struct somechip_shader_ctx *ctx, int input) { [B](u)int_fastQ_t[/B] r; // I don't know the range of r, // but it could be determined in the case // and limited to 16 bits or even 8 struct some_gpu_bytecode_alu alu; memset(&alu, 0, sizeof(struct some_gpu_bytecode_alu)); alu.inst = SOME_ALU_INSTRUCTION_INTERP_LOAD_P0; alu.dst.sel = ctx->shader->input[input].gpr; alu.dst.write = 1; alu.src[0].sel = SOME_ALU_SRC_PARAM_BASE + ctx->shader->input[input].lds_pos; for ([B]uint_fast8_t[/B] i = 0; i < 4; i++) { alu.dst.chan = i; alu.src[0].chan = i; if (i == 3) alu.last = 1; r = some_alu_bytecode_add_alu(ctx->bc, &alu); if (unlikely(r)) break; } return r; }
More improvements would actually require to know more about the specific project (I would be happy if you could PM me the name of the project; I really would like to know (an hope it's not Intel)).
Did you commit your changes?
Best regards
FRIGN
Comment