12-Core ARM Cluster Benchmarked Against Intel Atom, Ivy Bridge, AMD Fusion
The EP.C test is still Embarrassingly Parallel but with the C problem size, which is about four times larger than EP.B. Going from one to twelve cores, there was a 10.07x speed-up.
For the NPB FT test, which is a discrete 3D fast Fourier Transform, all-to-all communication, there is a problem. When involving MPI across multiple PandaBoards, the performance plummets compared to when utilizing just a single board.
The last NPB test for looking at the scaling is LU.A. The LU pseudo-application is a Lower-Upper Gauss-Seidel solver. This workload did not scale as well across the cluster with going from one to twelve cores just resulting in a 4.8x performance improvement. However, this was not a failure of MPI or the PandaBoards with the scaling when going from one to two cores on a single PandaBoard ES just yielding a 29% improvement.