Why not go the FatBinary approach. Compile different Versions and link them in one file. And let the CPU Dispacher choose the correct codepath for the...