Thing is, these blit functions just move a contiguous block of memory, without the per-line offsets and such that you need to move an arbitrary rectangle from one part of the framebuffer to the other. I'd need to add another function to the dispatch tables of each of the chipsets.
The good news is that it looks like one implementation would be sufficient for r100-r500. I'll have a go at implementing this when I get set up (half-way through moving house). I can't do anything about the r600+ versions as I don't have one to mess with.