Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] TRTorch runtime isn't threadsafe,which maybe segment-fault or hang in online-serving multi-thread environment #618

Closed
gssplayer opened this issue Sep 3, 2021 · 2 comments · Fixed by #658
Labels
bug Something isn't working

Comments

@gssplayer
Copy link

gssplayer commented Sep 3, 2021

Bug Description

We optimized one model based on TRTorch v0.3.0 successfully, but failed to deploy in online-serving, which maybe core-dump or hang at MemoryD2H. By locking torch.jit inference,we could work-around this issue currently, and we guess trtorch runtime is not thread-safe.

I collected other issue or commits related this problem:

From these issue or commits, I get nvinfer1::IExecuteContext is not thread-safe.

So how to fix this bug in trtorch?

@gssplayer gssplayer added the bug Something isn't working label Sep 3, 2021
@gssplayer
Copy link
Author

gssplayer commented Sep 3, 2021

Now I lock whole "execute_engine" op like this, which will protect trt task submit,not whole inference(from start until getting final result). We've tested it's fixed in single-stream & static-shape case(Maybe not fully test)。 I wonder if it‘s also OK in multi-stream or dynamic-shape case.

std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intrusive_ptr<TRTEngine> compiled_engine) {
 LOG_DEBUG("Attempting to run engine (ID: " << compiled_engine->name << ")");
 std::vector<void*> gpu_handles;

 std::vector<at::Tensor> contig_inputs{};
 contig_inputs.reserve(inputs.size());

 // nvinfer1::IExecutionContext is not thread-safe.
 std::unique_lock<std::mutex> lock(compiled_engine->mu);
 for (size_t i = 0; i < inputs.size(); i++) {
   uint64_t pyt_idx = compiled_engine->in_binding_map[i];
   TRTORCH_CHECK(
       inputs[pyt_idx].is_cuda(),
       "Expected input tensors to have device cuda, found device " << inputs[pyt_idx].device());
   auto expected_type = util::toATenDType(compiled_engine->exec_ctx->getEngine().getBindingDataType(i));
   TRTORCH_CHECK(
       inputs[pyt_idx].dtype() == expected_type,
       "Expected input tensors to have type " << expected_type << ", found type " << inputs[pyt_idx].dtype());
   auto dims = core::util::toDimsPad(inputs[pyt_idx].sizes(), 1);
   auto shape = core::util::toVec(dims);
   contig_inputs.push_back(inputs[pyt_idx].view(shape).contiguous());
   LOG_DEBUG("Input shape: " << dims);
   compiled_engine->exec_ctx->setBindingDimensions(i, dims);
   gpu_handles.push_back(contig_inputs.back().data_ptr());
 }

 TRTORCH_CHECK(
     compiled_engine->exec_ctx->allInputDimensionsSpecified(), "Not enough inputs provided (runtime.RunCudaEngine)");

 std::vector<at::Tensor> outputs(compiled_engine->num_io.second);
 for (size_t o = inputs.size(); o < (compiled_engine->num_io.first + compiled_engine->num_io.second); o++) {
   uint64_t pyt_idx = compiled_engine->out_binding_map[o];
   auto out_shape = compiled_engine->exec_ctx->getBindingDimensions(o);
   LOG_DEBUG("Output shape: " << out_shape);
   auto dims = core::util::toVec(out_shape);
   auto type = util::toATenDType(compiled_engine->exec_ctx->getEngine().getBindingDataType(o));
   outputs[pyt_idx] = std::move(at::empty(dims, {at::kCUDA}).to(type).contiguous());
   gpu_handles.push_back(outputs[pyt_idx].data_ptr());
 }

 c10::cuda::CUDAStream stream = c10::cuda::getCurrentCUDAStream(inputs[0].device().index());
 bool success = compiled_engine->exec_ctx->enqueueV2(gpu_handles.data(), stream, nullptr);
 TRTORCH_CHECK(success, "execute engine enequeue trt task failed");

 return outputs;
}

@narendasan
Copy link
Collaborator

Can you open a PR or something so we can see all changes? Like did you add the compiled_engine->mu field? Yeah the IExecutionContext is not thread-safe. You are supposed to have one per thread. We used to create it on the fly but that incurs a runtime cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants