NVIDIA Releases TensorRT 2; TensorRT 3 Being Prepped For Volta

With TensorRT 2, NVIDIA is reporting 45x faster inference under 7ms real-time latency with INT8 precision. Besides being much faster, TensorRT 2 allows for user-defined layers as plug-ins using TensorRT's Custom Layer API. This inference optimizer and runtime engine also allows sequence-based models for image captioning / language translation and other possible use-cases using LSTM and RNN layers.
Deep learning developers can download TensorRT 2 via developer.nvidia.com.
NVIDIA also revealed in the TensorRT 2 announcement that TensorRT 3 is being worked on for Volta GPUs. TensorRT 3 is looking to be around 3.5x faster for inference when using Tesla V100 hardware compared to Tesla P100. More details on that via the above link.
Add A Comment