Originally posted by Weasel
View Post
Anyway, what memcpy() and friends usually do to maximize the amount of data processed per cycle is to utilize wide, vector registers. To support this, L1 cache typically has at least one port that's wider than the CPU's wordlength.
Leave a comment: