Thread Safety #645
ruoqianguo
started this conversation in
RFCs
Replies: 1 comment
-
cc: @narendasan @peri044 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Goal
As mentioned in #618 , TRTorch runtime is not thread-safe, which maybe case segment-fault or hang in online-serving multi-thread environment. We need to make sure the TRTorch runtime is thread-safe.
RoadMap
From the commit in TF2TRT, we found that nvinfer1::IExecutionContext::enqueue is not thread safe and we need a mutex for it.
Besides, the TRT best practices guide says: to "Create a CUDA stream using cudaStreamCreate for each independent batch and an IExecutionContext for each independent batch." One should interpret this the following way: if we want to enque work on multiple GPU streams then each independent batch of work has to have a separate IExecutionContext object. Alternatively we can use a single stream with a single execution context. If we call enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently, it will result in undefined behavior.
From this issue, we found that we have a single compute stream for a physical GPU in TF. That ensues that the TRTEngineOp running on that GPU will have correct stream ordering. But in PyTorch we could have different streams for a physical GPU through c10::cuda::CUDAStreamGuard.
In general, we may need to discuss the following questions:
If we use a single stream, then we just add a mutex for nvinfer1::IExecutionContext::enqueue. The users need to make sure that they only use a single stream, otherwise there will be an undefined behavior.
of IExecutionContext. When the TRTEngine is initialized, we can construct some IExecutionContext based on the TRT engine. During the execution of the engine, we will look up the current stream in the map. When we get a new stream, we will assign a new IExecutionContext to the stream. In addition, we also need to set a maximum number of streams.
Multiple streams pseudocode likes that:
Beta Was this translation helpful? Give feedback.
All reactions