NVIDIA Releases TensorRT 2; TensorRT 3 Being Prepped For Volta
NVIDIA has made their TensorRT 2 library publicly available today as the newest major update to their deep-learning inference optimizer and run-time.
With TensorRT 2, NVIDIA is reporting 45x faster inference under 7ms real-time latency with INT8 precision. Besides being much faster, TensorRT 2 allows for user-defined layers as plug-ins using TensorRT's Custom Layer API. This inference optimizer and runtime engine also allows sequence-based models for image captioning / language translation and other possible use-cases using LSTM and RNN layers.
Deep learning developers can download TensorRT 2 via developer.nvidia.com.
NVIDIA also revealed in the TensorRT 2 announcement that TensorRT 3 is being worked on for Volta GPUs. TensorRT 3 is looking to be around 3.5x faster for inference when using Tesla V100 hardware compared to Tesla P100. More details on that via the above link.
With TensorRT 2, NVIDIA is reporting 45x faster inference under 7ms real-time latency with INT8 precision. Besides being much faster, TensorRT 2 allows for user-defined layers as plug-ins using TensorRT's Custom Layer API. This inference optimizer and runtime engine also allows sequence-based models for image captioning / language translation and other possible use-cases using LSTM and RNN layers.
Deep learning developers can download TensorRT 2 via developer.nvidia.com.
NVIDIA also revealed in the TensorRT 2 announcement that TensorRT 3 is being worked on for Volta GPUs. TensorRT 3 is looking to be around 3.5x faster for inference when using Tesla V100 hardware compared to Tesla P100. More details on that via the above link.
Add A Comment