Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix race condition in NaiveEngine::PushAsync #19108

Merged
merged 2 commits into from
Sep 11, 2020

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Sep 10, 2020

Wait for async_fun to complete in NaiveEngine::PushAsync.

This fixes a race condition in which NaiveEngine::PushAsync was checking if the
the async_fun had completed by the end of NaiveEngine::PushAsync scope.

If async_fun hadn't completed yet, NaiveEngine::PushAsync would set an internal error string
and deallocate the callback, causing segfault in async_fun once it would attempt
calling the callback.

This fixes a race condition in which NaiveEngine::PushAsync was checking if the
the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun
hadn't completed yet, NaiveEngine::PushAsync would set an internal error string
and deallocate the callback, causing segfault in async_fun once it would attempt
calling the callback.
@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, miscellaneous, windows-cpu, windows-gpu, centos-gpu, clang, edge, unix-cpu, unix-gpu, centos-cpu, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu leezu requested a review from ptrendx September 10, 2020 01:36
@leezu
Copy link
Contributor Author

leezu commented Sep 10, 2020

@mxnet-bot run ci [all]

1 similar comment
@leezu
Copy link
Contributor Author

leezu commented Sep 10, 2020

@mxnet-bot run ci [all]

src/engine/naive_engine.cc Outdated Show resolved Hide resolved
Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leezu
Copy link
Contributor Author

leezu commented Sep 10, 2020

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@leezu
Copy link
Contributor Author

leezu commented Sep 10, 2020

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@leezu
Copy link
Contributor Author

leezu commented Sep 10, 2020

NVIDIA/thrust#1090 comes back with high frequency in this PR... 10 compilation attempts in a row failed (each CI run attempts compilation 5 times before giving up).

@leezu
Copy link
Contributor Author

leezu commented Sep 11, 2020

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@leezu
Copy link
Contributor Author

leezu commented Sep 11, 2020

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@szha szha merged commit 1f15819 into apache:master Sep 11, 2020
@leezu leezu deleted the naiveenginepushasync branch September 11, 2020 23:00
leezu added a commit to leezu/mxnet that referenced this pull request Sep 11, 2020
* Wait for async_fun to complete in NaiveEngine::PushAsync

This fixes a race condition in which NaiveEngine::PushAsync was checking if the
the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun
hadn't completed yet, NaiveEngine::PushAsync would set an internal error string
and deallocate the callback, causing segfault in async_fun once it would attempt
calling the callback.

* Update naive_engine.cc
szha pushed a commit that referenced this pull request Sep 13, 2020
* Wait for async_fun to complete in NaiveEngine::PushAsync

This fixes a race condition in which NaiveEngine::PushAsync was checking if the
the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun
hadn't completed yet, NaiveEngine::PushAsync would set an internal error string
and deallocate the callback, causing segfault in async_fun once it would attempt
calling the callback.

* Update naive_engine.cc
chinakook pushed a commit to chinakook/mxnet that referenced this pull request Nov 17, 2020
* Wait for async_fun to complete in NaiveEngine::PushAsync

This fixes a race condition in which NaiveEngine::PushAsync was checking if the
the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun
hadn't completed yet, NaiveEngine::PushAsync would set an internal error string
and deallocate the callback, causing segfault in async_fun once it would attempt
calling the callback.

* Update naive_engine.cc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants