Originally posted by vladpetric
View Post
Originally posted by vladpetric
View Post
This makes a simple weakness fun part is the A53 and the A72 have the same weakness. Yes the raspberry pi 3 for a A53 is also under performing. Rockchip in performance have been beating the broadcom ones by a large margin for quite some time because their design are dual cluster so avoid memory unit setting AMBA slow because it has a CCN and that speed is set independent to what your memory controller is.
vladpetric there are catches to making a design generic this is one of them. A73 and newer when you tap out your design for a single cluster modifies the AMBA not the individual cores to prevent the issue so the A73 core and newer still are interacting the same be the chip 1 cluster or 1000 clusters+ this saves time validating core design.
Basically arm chip coherence is the AMBA not the L2.
vladpetric fun part here is AMD Zen x86 chips with a single cluster yes AMD has make them is also adversely effected by slow ram resulting in infinity fabric(AMD equal to the AMBA) set slow. Yes that does include single threaded performance dropping off even for operations that don't perform any memory operations on the Zen exactly like the ARM cores. There seams to be a nice commonality in design here between AMD and ARM this happens after AMD licenses stuff from ARM.
There is a trait to the ARM designs and that trait is in the AMD Zen chips as well it all about how AMBA(arm)/Infinity fabric(AMD) is used by the cores for coherence. The intel way has you using the caches but the ARM and AMD way has you using a bus particularly for coherence so the speed of that bus is kind of critical to performance.
By the way the l2 in the arm designs for hard real-times turns out to be optional the AMBA is not. Really if the arm chips depend on L2 for coherence it would not be a removable part. Basically you are applying Intel logic to something that Intel logic does not apply vladpetric.
Comment