Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] tensorrt may randomly release imported resources before deallocate #2632

Closed
AllentDan opened this issue Feb 3, 2023 · 8 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@AllentDan
Copy link

Description

After inheriting tensort.IGpuAllocator and override deallocate function in a script. The resouces may get released randomly before deallocate execution.

Environment

TensorRT Version: 8.4.1.5
NVIDIA GPU: A100 or Geforce 1660
NVIDIA Driver Version: 470.103.01
CUDA Version: 11.3
CUDNN Version: corresponding to cuda 11.3
Operating System: ubuntu 18.04

Steps To Reproduce

here is the code implementation. And this is our work-around to fix it.

@zerollzeng
Copy link
Collaborator

@pranavm-nvidia Not sure whether this is an issue in our python api. Can you take a look first?

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Feb 4, 2023
@pranavm-nvidia
Copy link
Collaborator

pranavm-nvidia commented Feb 6, 2023

Because of how our bindings are implemented, you have to explicitly instantiate the base class in your __init__ function (i.e. you cannot use super()):

trt.IGpuAllocator.__init__(self)

I'll update the API docs to clarify this.

@AllentDan
Copy link
Author

Because of how our bindings are implemented, you have to explicitly instantiate the base class in your __init__ function (i.e. you cannot use super()):

trt.IGpuAllocator.__init__(self)

I'll update the API docs to clarify this.

I tried the method, the error still exists in my local machine.

@AllentDan
Copy link
Author

Since the bug sometimes is not easy to be triggered, you may try print(torch) inside the deallocate function. There will be some error logs as expected.

@pranavm-nvidia
Copy link
Collaborator

From my reading of the PR you linked, it sounds like torch or torch.cuda is somehow being unloaded. I don't think that's something TensorRT would be able to do.
Do you know when torch is being released?

@AllentDan
Copy link
Author

AllentDan commented Feb 8, 2023

At the end of the program, before TorchAllocator got released. Not only torch or torch.cuda, logging was also released before deallocate been called.

@pranavm-nvidia
Copy link
Collaborator

Can you try making sure that any TRT objects you create (engine, context, etc.) are scoped? If they're in the global scope, then they might be freed after the torch module has been unloaded.

@ttyio
Copy link
Collaborator

ttyio commented Mar 14, 2023

closing since no activitity for more than 3 weeks, thanks!

@ttyio ttyio closed this as completed Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants