I don't think the developers are surprised that performance hasn't gone up, and it's probably fair to say they worked really hard to make sure that performance didn't go *down* any more than it did as a consequence of moving to a more flexible architecture that could support desired features like higher levels of GL support and higher system performance in the future.
Yes and no - there are probably small pieces of the open source stack which will need to be tossed and re-implemented in order to get big performance gains. Even so, I don't think that would have much effect on the other >90% of the stack.
Before you ask, I don't know if anyone has had time to do any real performance analysis work yet to identify where the weak points are (although the need to retransmit lots of state information under DRI2 is an obvious suspect). The performance numbers suggest that there are a small number of resolution-independent bottlenecks dragging down 3D performance, and that most of the code does not "have a performance problem".
One of the solutions being discussed is to store relatively more state information in the kernel driver and let the kernel driver decide if the state info currently programmed into the GPU registers is still valid. That seems likely to make a big honkin' difference in performance, and would probably eliminate the performance delta between KMS and UMS. It's not a trivial change, however.
A nastier question is how big and complex the open source driver can become before it starts to have the same challenges as the proprietary driver. Our early estimate was that the open source stack could probably get to ~60-70% of the 3D performance of fglrx without having to get "scary complicated", and I haven't seen anything to change that view yet.
I'm not an active coder but I have a window into both proprietary and open source development, and it seems to me that there is definitely an opportunity for significant performance improvement. The question is whether performance work should be higher priority than stability and core features... current thinking is "no" and that seems like the right choice to me.