Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC][TRT] Allocate GPU data buffers and transfer data when needed #6872

Merged
merged 3 commits into from
Nov 7, 2020

Conversation

trevor-m
Copy link
Contributor

@trevor-m trevor-m commented Nov 6, 2020

This PR enables the TRT BYOC integration to be used with target="llvm" (previously could only use "cuda").
If an input or output DLTensor is not located on the GPU, we will now allocate a GPU buffer to pass to TensorRT and transfer the data from the DLTensor accordingly. Since data_entry_ is needed during BuildEngine now, we had to move BuildEngine from JsonRuntime::Init to first run.

This is prerequisite to use TRT BYOC in combination with Relay VM which in general requires llvm target.

Thanks @ylc for original implementation: neo-ai#147

@trevor-m
Copy link
Contributor Author

trevor-m commented Nov 6, 2020

@trevor-m trevor-m changed the title [BYOC][TRT] Allocate GPU data buffers when needed and transfer data [BYOC][TRT] Allocate GPU data buffers and transfer data when needed Nov 6, 2020
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few comments.

src/runtime/contrib/tensorrt/tensorrt_builder.cc Outdated Show resolved Hide resolved
@@ -106,9 +104,11 @@ class TensorRTRuntime : public JSONRuntimeBase {
#ifdef TVM_GRAPH_RUNTIME_TENSORRT
/*! \brief Run inference using built engine. */
void Run() override {
BuildEngine();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason of moving BuildEngine from Init to Run because you need subgraph specific information (e.g., I/O data entry IDs) to allocate device buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comaniac for the review! Yes, to allocate the device buffers we need the DLTensor context and shape. data_entry_ in JSON runtime isn't initialized until Run() so I had to move BuildEngine.

In the future, we are planning to be able to dynamically build engines for different input shapes in order to handle subgraphs with dynamic input sizes, so moving it would be needed for that anyway.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comaniac comaniac merged commit c6cf58f into apache:main Nov 7, 2020
@comaniac
Copy link
Contributor

comaniac commented Nov 7, 2020

Thanks @trevor-m @zhiics

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
…pache#6872)

* Allocate data buffers for gpu

fix

* Rename AllocateDeviceBuffer, update docstrings

* Remove unneeded cast
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
…pache#6872)

* Allocate data buffers for gpu

fix

* Rename AllocateDeviceBuffer, update docstrings

* Remove unneeded cast
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
…pache#6872)

* Allocate data buffers for gpu

fix

* Rename AllocateDeviceBuffer, update docstrings

* Remove unneeded cast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants