[Bug] tensorrt may randomly release imported resources before deallocate #2632

AllentDan · 2023-02-03T08:27:02Z

Description

After inheriting tensort.IGpuAllocator and override deallocate function in a script. The resouces may get released randomly before deallocate execution.

Environment

TensorRT Version: 8.4.1.5
NVIDIA GPU: A100 or Geforce 1660
NVIDIA Driver Version: 470.103.01
CUDA Version: 11.3
CUDNN Version: corresponding to cuda 11.3
Operating System: ubuntu 18.04

Steps To Reproduce

here is the code implementation. And this is our work-around to fix it.

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-02-04T14:40:22Z

@pranavm-nvidia Not sure whether this is an issue in our python api. Can you take a look first?

pranavm-nvidia · 2023-02-06T17:17:12Z

Because of how our bindings are implemented, you have to explicitly instantiate the base class in your __init__ function (i.e. you cannot use super()):

trt.IGpuAllocator.__init__(self)

I'll update the API docs to clarify this.

AllentDan · 2023-02-07T01:41:45Z

Because of how our bindings are implemented, you have to explicitly instantiate the base class in your __init__ function (i.e. you cannot use super()):
trt.IGpuAllocator.__init__(self)
I'll update the API docs to clarify this.

I tried the method, the error still exists in my local machine.

AllentDan · 2023-02-07T01:47:12Z

Since the bug sometimes is not easy to be triggered, you may try print(torch) inside the deallocate function. There will be some error logs as expected.

pranavm-nvidia · 2023-02-07T23:39:13Z

From my reading of the PR you linked, it sounds like torch or torch.cuda is somehow being unloaded. I don't think that's something TensorRT would be able to do.
Do you know when torch is being released?

AllentDan · 2023-02-08T01:41:17Z

At the end of the program, before TorchAllocator got released. Not only torch or torch.cuda, logging was also released before deallocate been called.

pranavm-nvidia · 2023-02-09T17:42:49Z

Can you try making sure that any TRT objects you create (engine, context, etc.) are scoped? If they're in the global scope, then they might be freed after the torch module has been unloaded.

ttyio · 2023-03-14T02:58:22Z

closing since no activitity for more than 3 weeks, thanks!

zerollzeng assigned pranavm-nvidia Feb 4, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Feb 4, 2023

ttyio closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] tensorrt may randomly release imported resources before deallocate #2632

[Bug] tensorrt may randomly release imported resources before deallocate #2632

AllentDan commented Feb 3, 2023

zerollzeng commented Feb 4, 2023

pranavm-nvidia commented Feb 6, 2023 •

edited

Loading

AllentDan commented Feb 7, 2023

AllentDan commented Feb 7, 2023

pranavm-nvidia commented Feb 7, 2023

AllentDan commented Feb 8, 2023 •

edited

Loading

pranavm-nvidia commented Feb 9, 2023

ttyio commented Mar 14, 2023

[Bug] tensorrt may randomly release imported resources before deallocate #2632

[Bug] tensorrt may randomly release imported resources before deallocate #2632

Comments

AllentDan commented Feb 3, 2023

Description

Environment

Steps To Reproduce

zerollzeng commented Feb 4, 2023

pranavm-nvidia commented Feb 6, 2023 • edited Loading

AllentDan commented Feb 7, 2023

AllentDan commented Feb 7, 2023

pranavm-nvidia commented Feb 7, 2023

AllentDan commented Feb 8, 2023 • edited Loading

pranavm-nvidia commented Feb 9, 2023

ttyio commented Mar 14, 2023

pranavm-nvidia commented Feb 6, 2023 •

edited

Loading

AllentDan commented Feb 8, 2023 •

edited

Loading