OK first you need to have two adress maps which do not match so they both have to know what adress of a copy of what adres matches with a tag.
Then both the CPU and the GPU must never at the same time write to the same thing at the same time, while you cannot work with each other (great) so one of the two needs to be the dominant desicion maker. That would be the CPU as it can execute a driver for the graphics card.
So then the CPU will put what's about to be modified in a command buffer (sort of) and then check what the GPU can alter at that time that does not correspond to the CPU's adress tags changes. Then wen both buffers are empty the lock on what can't be done by the GPU that the CPU keeps track of is lifted and then the GPU can continue.
Either way massive latency hell...