TensorRT is an optimization library (SDK) and deep learning model serving system built on CUDA. NVIDIA developed TensorRT for GPU-enabled production environments that are focused on high-throughput and low-latency applications.
Gradient natively integrates with TensorRT and includes a pre-built TensorRT image out of the box which is updated regularly. Alternatively, customers can use a customized version of TensorRT by using their own Docker image hosted on a public or private Docker registry.
When creating a Deployment, you can select the prebuilt image or bring your own custom image. These options are possible via the web UI, the CLI, or defined as a step within an automated pipeline.
When using the CLI, the command would like something like this:
Learn more about the Gradient TensorRT integration in the docs.