Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] Warning as default stream was used in enqueueV3() #3190

Closed
keehyuna opened this issue Sep 27, 2024 · 2 comments · Fixed by #3191
Closed

🐛 [Bug] Warning as default stream was used in enqueueV3() #3190

keehyuna opened this issue Sep 27, 2024 · 2 comments · Fixed by #3191
Assignees
Labels
bug Something isn't working

Comments

@keehyuna
Copy link
Collaborator

Bug Description

Some time below warning is seen while running the model.

WARNING: [Torch-TensorRT - Debug Build] - Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.

To Reproduce

It was reproduced with resnet model with multiple interference call. Both use_python_runtime=False/True have issue

model = models.resnet18(pretrained=True).eval().to("cuda")
input = torch.randn((1, 3, 224, 224)).to("cuda")
compile_spec = {
    "inputs": [
        torchtrt.Input(
            input.shape, dtype=torch.float, format=torch.contiguous_format
        )
    ],
    "device": torchtrt.Device("cuda:0"),
    "enabled_precisions": {torch.float},
    "ir": "dynamo",
    "cache_built_engines": False,
    "reuse_cached_engines": False,
    "use_python_runtime": True,
}

trt_mod = torchtrt.compile(model, **compile_spec)
for i in range(5):
    trt_mod(input)
# Clean up model env
torch._dynamo.reset()

Expected behavior

No warning message

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0):
  • PyTorch Version (e.g. 1.0):
  • CPU Architecture:
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, libtorch, source):
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version:
  • CUDA version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@keehyuna keehyuna added the bug Something isn't working label Sep 27, 2024
@keehyuna keehyuna self-assigned this Sep 27, 2024
@sean-xiang-applovin
Copy link

I have seen this too, my solutions is to

 with torch.cuda.stream(torch.cuda.Stream()):
    # inference with your compiled model

@keehyuna
Copy link
Collaborator Author

I have seen this too, my solutions is to

 with torch.cuda.stream(torch.cuda.Stream()):
    # inference with your compiled model

Yes, running model under new cuda stream will set current stream and enqueueV3() will be executed with this stream or other side stream. Fix 3191 alloc and keep non-default cuda stream and use it for cuda graph capture/replay, enqueueV3 of model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants