Anyway, the kind of fallback you describe should be quite efficient (data flows only in one direction, no ping-pong), no? Also, it does not explain performance that is an order of magnitude slower than pixman and scales (nearly) linearly with GPU clock.
I mean that radeon_cs_emit() is done synchronously. Mesa seems to do it in a worker thread. radeon_cs_emit can take a few milliseconds to complete, so I guess that's worthwile.I'm not sure what you mean by that. Can you give an example? You need to synchronize caches when the domain switches between GPU and CPU or when a read or write domain changes.