There will be a development board called CARMA, launching in 2012Q2. Dunno about the price and whether they will sell it to everybody or you have to be special friends with NVidia.
Cachebench is only testing the compiler
Good to see tests of Tegra 3 and other ARM SoC's!
However, the current use of Cachebench in PTS is of little use. It is only a test of how bad the compiler optimize when compiling with -O. The result differs a lot if better optimization is used.
The result from the Cachebench read-test I get this on an Athlon 64 X2 computer using Debian testing gcc-4.6:
PTS result, using the default -O: 1308 MB/s
PTS result, using -Ofast: 5487 MB/s
PTS result, using -Ofast -fprefetch-loop-arrays: 8440 MB/s
Cachebench is very sensitive to compiler optimizations, and just using -O is more of a test of the included optimizations when using -O and when tuning for the default CPU for the given compiler. Different versions of gcc include different optimizations, and different distributions of Linux set different default tuning options.
From a hardware test point of view, you should either use the same binary, or you should find the best results.
Now we see results from different compilers with non-optimal optimization and tuning flags, and even worse,
you don't state the compiler used, including default tuning options.
It is quite obvious that 1308 MB/s is a useless result when better optimization give you 8440 MB/s on the same hardware.
Needs more analysis...
we ran our Panda against this benchmark... differences from the reference Panda was PTS 3.8, GCC 4.7, and kernel ver 1409...
so the Tegra only wins a couple of these comparisons... we probably need to widen the search space. It is interesting how much the Panda performance varied. We couldn't run Crafty because of a dependency failure (libnuma-dev)...