Performance Issue when using tools/llm

## ❓ Question



## What you have already tried



## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

 - PyTorch Version (e.g., 1.0): 2.8.0
 - CPU Architecture: amd
 - OS (e.g., Linux): ubuntu 22.04
 - How you installed PyTorch (`conda`, `pip`, `libtorch`, source): pip
 - Build command you used (if compiling from source): NO
 - Are you using local sources or building from archives: NO
 - Python version: 3.10
 - CUDA version: 12.8
 - GPU models and configuration: NVIDIA
 - Any other relevant information: directly use torch-tensorrt 2.8.0 wheel with github 2.8.0 tag to run tools/llm

## Additional context

Hi there, I tried to use tools/llm with static_cache_v2 to run qwen2.5 model, and I use such script to run:

python run_llm.py --model Qwen/Qwen2.5-0.5B-Instruct --prompt "What is parallel programming?" --precision FP16 --num_tokens 128 --cache static_v2 --benchmark

when i use nsight system to profiling, I found that using static_cache_v2 would bring launch overhead to tensorrt engine in each prefill / decode block, do you have this problem too? thought this overhead is too much, almost make torch-tensorrt the same speed compared to just enable torch.compile

here is the nsys profiling result: the red line shows there is approximately 1.7ms overhead and no gpu activities at all (when disabling static_cache_v2 there is no such bubbles, thought maybe because shape copy or other operators with static_cache_v2?)

<img width="1488" height="942" alt="Image" src="https://github.com/user-attachments/assets/394800a7-cd8e-40ff-abbf-9a2a4b928aeb" />

looking forward to your reply, thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Issue when using tools/llm #3803

❓ Question

What you have already tried

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Issue when using tools/llm #3803

Description

❓ Question

What you have already tried

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions