-
Notifications
You must be signed in to change notification settings - Fork 6.8k
fix naive engine for multi-threaded inference #15574
Conversation
@anirudh2290 Could you please take a look? |
Thanks for the contribution, but MXNets frontend is not considered threadsafe in general. Thus, I'd prefer if we hold up with this fix. I feel like we should rather do a proper analysis about the threadsafety of MXNet and then act upon that. Small patches won't really solve the problem but might possibly introduce further issues. |
src/engine/naive_engine.cc
Outdated
@@ -238,7 +238,11 @@ class NaiveEngine final : public Engine { | |||
static_cast<NaiveEngine*>(engine)->req_completed_ = true; | |||
} | |||
// whether action is completed | |||
bool req_completed_; | |||
#if DMLC_CXX11_THREAD_LOCAL | |||
static thread_local bool req_completed_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to C++ standards, thread_local implies static.
https://stackoverflow.com/questions/22794382/are-c11-thread-local-variables-automatically-static
@ZhennanQin could you help to take a review for the fix? |
I just evaluated the thread-safe condition for |
src/engine/naive_engine.cc
Outdated
static thread_local bool req_completed_; | ||
#else | ||
static MX_THREAD_LOCAL bool req_completed_; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces an additional overhead for thread local, and the number of ops in mxnet model can be in the scale of 100s. I am not sure we should add this without a build flag since we are adding overhead for single threaded use cases too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please not another build flag - at some point we're gonna spend more money on CI than the NASA :D
Could we make some benchmarks to determine the impact here?
I'm not that deep inside the engine code, so excuse me if that's a stupid question. But why are we declaring the internal variables of the engine as thread local instead of creating thread-local instantiations of the engines. Each thread gets its own engine (or multiple engines or a single engine is shared across threads, synchronized with locks) and we switch all static variables to private ones. The current design of the engine doesn't seem to have the creation of multiple instantiations in mind so we should fix that instead of making all the static variables threadlocal.
@marcoabreu @ZhennanQin I agree we need a bigger solution for the problem at hand.Having said that, Looks like the scope of the PR is limited to fix the existing multi threaded inference which is limited to the C Predict API which should be fine. |
@apeforest @ZhennanQin @marcoabreu @anirudh2290 Since threaded engine uses multi-thread scheduling for operator computation, so a natural idea is to utilize it for parallel inference which is covered in this proposal. There are two main problem for this kind of usage in mxnet:
Due to these problems and the lacking of a workable example, so I focus on the naive engine instead. The only example in mxnet now is cpp predict with
|
@apeforest @anirudh2290 |
@arcadiaphy I guess that would be nice to explore. We can remove it only if we can make sure that all exec_funs pushed make a call to callback, since its original purpose is to prevent async calls being pushed. |
I understand that you're trying to keep the changes to a minimum, but judging from the static variable definition, it doesn't seem like the engine was designed to be threadsafe - or even have multiple instantiations. I'd prefer if we make the engine only access internal-variables, get rid of static ones (aka proper object oriented design) and add synchronization to the APIs. But I don't really have a dog in that hunt, so take my opinion with a grain of salt here. |
@anirudh2290 Yes, in order to prevent async call from being pushed, using a single req_completed_ is not enough. We can add the req_completed_ to NaiveOpr and check whether it's been set to true in complete callback. Edit: |
@apeforest @anirudh2290 Gentle ping for the review. |
Description
In the cpp example of image classification,
MXPredCreateMultiThread
is used to perform multi-threaded inference with naive engine. Thereq_completed_
variable in naive engine is not thread-safe, which is fixed in this PR.The problem is reported in #8966.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments