-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] TRTorch runtime isn't threadsafe,which maybe segment-fault or hang in online-serving multi-thread environment #618
Comments
Now I lock whole "execute_engine" op like this, which will protect trt task submit,not whole inference(from start until getting final result). We've tested it's fixed in single-stream & static-shape case(Maybe not fully test)。 I wonder if it‘s also OK in multi-stream or dynamic-shape case. std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intrusive_ptr<TRTEngine> compiled_engine) {
LOG_DEBUG("Attempting to run engine (ID: " << compiled_engine->name << ")");
std::vector<void*> gpu_handles;
std::vector<at::Tensor> contig_inputs{};
contig_inputs.reserve(inputs.size());
// nvinfer1::IExecutionContext is not thread-safe.
std::unique_lock<std::mutex> lock(compiled_engine->mu);
for (size_t i = 0; i < inputs.size(); i++) {
uint64_t pyt_idx = compiled_engine->in_binding_map[i];
TRTORCH_CHECK(
inputs[pyt_idx].is_cuda(),
"Expected input tensors to have device cuda, found device " << inputs[pyt_idx].device());
auto expected_type = util::toATenDType(compiled_engine->exec_ctx->getEngine().getBindingDataType(i));
TRTORCH_CHECK(
inputs[pyt_idx].dtype() == expected_type,
"Expected input tensors to have type " << expected_type << ", found type " << inputs[pyt_idx].dtype());
auto dims = core::util::toDimsPad(inputs[pyt_idx].sizes(), 1);
auto shape = core::util::toVec(dims);
contig_inputs.push_back(inputs[pyt_idx].view(shape).contiguous());
LOG_DEBUG("Input shape: " << dims);
compiled_engine->exec_ctx->setBindingDimensions(i, dims);
gpu_handles.push_back(contig_inputs.back().data_ptr());
}
TRTORCH_CHECK(
compiled_engine->exec_ctx->allInputDimensionsSpecified(), "Not enough inputs provided (runtime.RunCudaEngine)");
std::vector<at::Tensor> outputs(compiled_engine->num_io.second);
for (size_t o = inputs.size(); o < (compiled_engine->num_io.first + compiled_engine->num_io.second); o++) {
uint64_t pyt_idx = compiled_engine->out_binding_map[o];
auto out_shape = compiled_engine->exec_ctx->getBindingDimensions(o);
LOG_DEBUG("Output shape: " << out_shape);
auto dims = core::util::toVec(out_shape);
auto type = util::toATenDType(compiled_engine->exec_ctx->getEngine().getBindingDataType(o));
outputs[pyt_idx] = std::move(at::empty(dims, {at::kCUDA}).to(type).contiguous());
gpu_handles.push_back(outputs[pyt_idx].data_ptr());
}
c10::cuda::CUDAStream stream = c10::cuda::getCurrentCUDAStream(inputs[0].device().index());
bool success = compiled_engine->exec_ctx->enqueueV2(gpu_handles.data(), stream, nullptr);
TRTORCH_CHECK(success, "execute engine enequeue trt task failed");
return outputs;
} |
Can you open a PR or something so we can see all changes? Like did you add the |
Bug Description
We optimized one model based on TRTorch v0.3.0 successfully, but failed to deploy in online-serving, which maybe core-dump or hang at MemoryD2H. By locking torch.jit inference,we could work-around this issue currently, and we guess trtorch runtime is not thread-safe.
I collected other issue or commits related this problem:
From these issue or commits, I get nvinfer1::IExecuteContext is not thread-safe.
So how to fix this bug in trtorch?
The text was updated successfully, but these errors were encountered: