Originally posted by vladpetric
View Post
thunderx2 32 core A72 chip. Yes 8 by 4 core setup. How do you think those 8 groups of cpus sync with each other. By the memory controller interface from the A72 core. So we are not talking a cache miss problem here. The reason why RPI4 performance is really tanked partly the same as why AMD zen core chips don't like low ram speed. Also partly because even that a RPI4 only has 4 A72 cores its still stalling going out to the memory bus asking is any other processor around working on this data and waiting for the time out slower ram speed longer this time out is bigger the very regular stall is. So you are not looking at 1 in 20 cache miss but instead a 1 in 5 sync stall or worse on top of your general cache misses.
Yes the speed of your memory controller with AMD effecting Zen Data Fabric frequency causing IPC performance drop you see the same with ARM A72 even if it only 4 cores by itself. Yes some of the core to core sync in the 4 core group is also going out to the memory controller bus with the A72. This is one of the changes in the A73. All this sync traffic on the memory bus kind does a number on your raw memory bandwidth.
A72 in the RPI 4 are basically server chips on embedded hardware configuration being very unhappy about it and showing its displeasure with low performance. Yes still way better performance than the A53 of the RPI 3 but still technically way slower than it should be. The RPI 4 is not a good item to benchmark to get any idea how a ideal setup Cortex A72 should behave. Something like a RPi4 should use a A73 or newer that has the fix that when you have 4 cores by self don't end up running out to the memory controller. But using a A73 or newer equals using more expensive nm production.
SPEC CPU 2006/2017 are in fact harmed on AMD Zen with low speed ram and A72 in RPI 4 due to low speed ram for exactly the same reason. RPI4 case a lot worse because the amount slower is a lot more.
Comment