There is no stall. The only difference is better CU utilization due to lower register usage thanks to removed never-taken conditional branches....