NVIDIA Releases TensorRT 8.0 With Big Performance Improvements
NVIDIA today is making available a much faster version of TensorRT, its SDK for optimized deep learning inference on their GPUs.
With TensorRT 8 that is being made public today, NVIDIA is reporting "2x performance" relative to the existing TensorRT 7 release. That 2x performance is around transformer optimizations while they are also claiming 2x accuracy against TensorRT 7 when using INT8 with quantization aware training.
TensorRT 8 also brings the BERT-Large inference time down to 1.2 ms on a V100, which is 2.5x faster than TensorRT 7. TensorRT 8 also has sparsity support for Ampere GPUs, among other improvements.
TensorRT 8.0 should be available shortly via developer.nvidia.com.