A Quick Test Of NVIDIA's "Carmel" CPU Performance
NVIDIA's Tegra Xavier SoC is becoming more widely available now that the Jetson Xavier Development Kit has begun shipping. Besides this latest SoC being an exciting design with its Volta-based GPU and having a Tensor Processing Unit / Deep Learning Accelerator, it's exciting on the CPU side as well with NVIDIA's custom-designed ARMv8 "Carmel" CPU cores.
The Tegra194 (Xavier) SoC features eight 10-wide superscalar Carmel CPU cores that are based on the ARMv8.2-A architecture and manufactured on a TSMC 12nm FinFET process.
The Linux /proc/cpuinfo for the Carmel cores:
These eight Carmel CPU cores is a big upgrade compared to the Tegra X2 that had two custom "Denver2" cores and then four ARM Cortex-A57 cores.
NVIDIA should be sending over a Jetson Xavier Development Kit shortly for benchmarking on Phoronix, but in the mean time, a Phoronix reader who pre-ordered one of these developer kits was kind enough and offered remote access to it for some brief benchmarking.
Due to the remote nature, I was just running some basic ARM CPU Linux benchmarks while once having my hands on the hardware will be looking more at the GPU/CUDA/tensor performance and other areas. Besides the eight Carmel cores and 512-core Volta GPU and dual NVDLA engines, the Jetson AGX Xavier also has 16GB of LPDDR4x memory, 32GB eMMC 5.1 storage, and 7-way VLIW vision processor.
The Jetson Xavier Kit was running with Ubuntu 18.04 LTS and using a Linux 4.9 AArch64 kernel. The Xavier/Carmel performance was compared to various other ARM boards within my possession from the Jetson TX1/TX2 to more common lower-end ARM hardware like the Raspberry Pi 3 B+, ASUS Tinker Board, ODROID-C2, Firefly, Le Potato, and others.