Originally posted by vladpetric
View Post
https://en.wikichip.org/wiki/arm_hol...ndividual_Core
Like A77 85 load vs 90 store. Why it this way will come clear when I answer the next bit.
Originally posted by vladpetric
View Post
Originally posted by vladpetric
View Post
One of the quirks of difference that you see in arm that are using arm register renaming you will see a load from address then a store to the same address in the same clock cycle because a load issuing ahead of store is possible. Load issuing after store disappears. This is why with arm you don't want smarts in the load/store buffers.
A load ahead of store you are after the value in the L1 that the load is asking for not the value store has pushed into the store buffer. A load after store you are wanting the value the store is sending out so this case arm redirect to the register that held the value instead of load function so the load function disappears. If you add the smarts to the load store buffers so a load checks if value change to address is in the store buffer this will break how the arm design works.
This also explains why you might be wanting to keep the load buffer slightly shorter than the store buffer so it slightly faster to get from one end of the que to the other.
Originally posted by vladpetric
View Post
Originally posted by vladpetric
View Post
Originally posted by vladpetric
View Post
The reality here is making your load/store buffers larger don't help much unless you have the micro ops to use that larger load store buffers. Even if you have the micro ops to use larger load/store buffers you don't need larger load/store if you don't have the instruction dispatch and register renaming to fill the micro ops.
Yes the size of the load/store buffers tell you are little about the cpu but without understanding how the CPU design is going to use those load store buffers the size does not tell you much. Like a Ryzen/Zen2 that is going to speculatively fill the load buffer having a lot larger load buffer than store makes sense then you have arm core designs where due to the register renaming design where a slightly smaller load buffer than store makes sense. Both designs can put out close the same IPC when they have close to the same level micro ops with means to fill and use those micro ops.
Yes lot of your general programs are 2 to 1 load/store ratio but you don't want your cpu stalling out when you have 1 to 1 load/store ratio(coping memory) or a 0 to 1 load/store ratio(like dumping out random data from in cpu random number generator micro op).
Comment