Originally posted by atomsymbol
If the MMU copy is done its not done write overlapping modified data first its done duplicate existing data by page table directions to the MMU then apply modified data.
MMU really does not have fine grained mapping of memory information the most detailed information it has is the page tables this includes for DMA. Yes when limited to the MMU granularity does limit what kind of operations you can do. This granularity causes another problem. 4k is the smallest page size . The largest pagesize is 2MiB on x86. REP Mov that optimise to MMU can work while page entries are 4k in current x86 implementations you use 2MIB page it not happening as it quite a lot of optimisation processing to say this is going to be a large enough operation to need a 2MiB page copy.
There are some MMU for arm that do support sending the copy page instruction to the MMU yourself. This is mostly not used unless developer of program goes out of way to code it in. The granularity of the MMU is a real limiting factor.
Yes I do agree that it could be useful to get that 1000% speedup on large blocks.
Comment