Building from Source Code on Linux — TensorRT-LLM

This document provides instructions for building TensorRT LLM from source code on Linux. Building from source is recommended for achieving optimal performance, enabling debugging capabilities, or when you need a different GNU CXX11 ABI configuration than what is available in the pre-built TensorRT LLM wheel on PyPI. Note that the current pre-built TensorRT LLM wheel on PyPI is linked against PyTorch 2.7.0 and subsequent versions, which uses the new CXX11 ABI.

Prerequisites

Use Docker to build and run TensorRT LLM. Instructions to install an environment to run Docker containers for the NVIDIA platform can be found here.

If you intend to build any TensortRT-LLM artifacts, such as any of the container images (note that there exist pre-built develop and release container images in NGC), or the TensorRT LLM Python wheel, you first need to clone the TensorRT LLM repository:

# TensorRT LLM uses git-lfs, which needs to be installed in advance.
apt-get update && apt-get -y install git git-lfs
git lfs install

git clone <https://github.com/NVIDIA/TensorRT-LLM.git>
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull

Building a TensorRT LLM Docker Image

There are two options to create a TensorRT LLM Docker image. The approximate disk space required to build the image is 63 GB.

Option 1: Build TensorRT LLM in One Step

TensorRT LLM contains a simple command to create a Docker image. Note that if you plan to develop on TensorRT LLM, we recommend using Option 2: Build TensorRT LLM Step-By-Step.

make -C docker release_build

You can add the CUDA_ARCHS="<list of architectures in CMake format>" optional argument to specify which architectures should be supported by TensorRT LLM. It restricts the supported GPU architectures but helps reduce compilation time:

# Restrict the compilation to Ada and Hopper architectures.
make -C docker release_build CUDA_ARCHS="89-real;90-real"

After the image is built, the Docker container can be run.

make -C docker release_run

The make command supports the LOCAL_USER=1 argument to switch to the local user account instead of root inside the container. The examples of TensorRT LLM are installed in the /app/tensorrt_llm/examples directory.

Since TensorRT LLM has been built and installed, you can skip the remaining steps.

Option 2: Container for building TensorRT LLM Step-by-Step

If you are looking for more flexibility, TensorRT LLM has commands to create and run a development container in which TensorRT LLM can be built.

As an alternative to building the container image following the instructions below, you can pull a pre-built TensorRT LLM Develop container image from NGC (see here for information on container tags). Follow the linked catalog entry to enter a new container based on the pre-built container image, with the TensorRT source repository mounted into it. You can then skip this section and continue straight to building TensorRT LLM.

On systems with GNU make