A Quick Test Of NVIDIA's "Carmel" CPU Performance

Written by Michael Larabel in Processors on 25 September 2018 at 06:38 AM EDT. Page 1 of 2. 23 Comments.

NVIDIA's Tegra Xavier SoC is becoming more widely available now that the Jetson Xavier Development Kit has begun shipping. Besides this latest SoC being an exciting design with its Volta-based GPU and having a Tensor Processing Unit / Deep Learning Accelerator, it's exciting on the CPU side as well with NVIDIA's custom-designed ARMv8 "Carmel" CPU cores.

The Tegra194 (Xavier) SoC features eight 10-wide superscalar Carmel CPU cores that are based on the ARMv8.2-A architecture and manufactured on a TSMC 12nm FinFET process.

The Linux /proc/cpuinfo for the Carmel cores:


These eight Carmel CPU cores is a big upgrade compared to the Tegra X2 that had two custom "Denver2" cores and then four ARM Cortex-A57 cores.

NVIDIA should be sending over a Jetson Xavier Development Kit shortly for benchmarking on Phoronix, but in the mean time, a Phoronix reader who pre-ordered one of these developer kits was kind enough and offered remote access to it for some brief benchmarking.

Due to the remote nature, I was just running some basic ARM CPU Linux benchmarks while once having my hands on the hardware will be looking more at the GPU/CUDA/tensor performance and other areas. Besides the eight Carmel cores and 512-core Volta GPU and dual NVDLA engines, the Jetson AGX Xavier also has 16GB of LPDDR4x memory, 32GB eMMC 5.1 storage, and 7-way VLIW vision processor.

Initial NVIDIA Jetson Xavier Linux Benchmarks

The Jetson Xavier Kit was running with Ubuntu 18.04 LTS and using a Linux 4.9 AArch64 kernel. The Xavier/Carmel performance was compared to various other ARM boards within my possession from the Jetson TX1/TX2 to more common lower-end ARM hardware like the Raspberry Pi 3 B+, ASUS Tinker Board, ODROID-C2, Firefly, Le Potato, and others.

Related Articles