You're nearly ALWAYS "just 1 or 2 registers short" in every loop, no matter how many registers you have. :P And the same is even more true for a compiler.
AFAICT, this is all just the KERNEL impact of freeing up those registers, isn't it? So while that's one part of the equation, and clearly of remarkable value on that side alone just from save/restore impact (substantially more than I would have expected, which makes me wonder if I haven't misunderstood things) the other side is rebuilding the userspace with the knowledge that it has 2 more registers available. Or is that already included in these benchmarks after all and I just missed where Michael noted it?
AFAICT, this is all just the KERNEL impact of freeing up those registers, isn't it? So while that's one part of the equation, and clearly of remarkable value on that side alone just from save/restore impact (substantially more than I would have expected, which makes me wonder if I haven't misunderstood things) the other side is rebuilding the userspace with the knowledge that it has 2 more registers available. Or is that already included in these benchmarks after all and I just missed where Michael noted it?
Comment