-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix races in conductor with dynamic agents #568
Conversation
I'll revisit this after fixing the pyproject.toml file. Thank you for the PR! |
Could I get permissions to edit this PR? I'd like to update it with the main branch. |
67712be
to
51b6ee2
Compare
I've granted access now, and I've also rebased on top of |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #568 +/- ##
=======================================
Coverage 93.71% 93.72%
=======================================
Files 102 102
Lines 11156 11166 +10
Branches 1534 1537 +3
=======================================
+ Hits 10455 10465 +10
Misses 613 613
Partials 88 88 ☔ View full report in Codecov by Sentry. |
Do we need to modify the cython bindings as well? |
I'm not sure. Mind pointing me in the right direction for this? |
Cython bindings are under https://github.com/faust-streaming/faust/tree/master/faust/_cython, but it's not immediately apparent to me if any changes need to be made there. I'd merge this ASAP because I know it fixes an important issue, but I'd like to understand how the value for Edit: Looks like that value has been there since 2018. I'm cautious about changing it. |
Looking at the bindings, I don't believe the bindings need to be modified.
@wbarnha I know the code quite well around this value, since I was modifying all the code that uses it.
If the value is set too low and an agent is adding topics very frequently, then resubscription will happen extremely often and issue unnecessary work on the async loop. If the value is set too high, it will take a long time for a newly added agent to start receiving messages (this time is bounded by the value of |
Thanks for the thorough write-up, I think this is worthy of being placed into the documentation. I also agree that the bindings don't need to be modified. Just to be sure there's nothing missed in the unittests, I'll test this locally on my own machine and if everything seems nominal, I'll approve this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the changes locally, everything looks good on my end. Thanks for the PR!
I added documentation around In the future, we should expose this as a setting to clients of the framework so they can choose.
There's nothing missing AFAICT. I couldn't find a good way to incorporate my repro code into a unit test as they currently exist. |
Description
This change fixes issues where agents being dynamically added would cause races within the conductor's internal state.
Simple Python code to reproduce
With this change, this program no longer races and crashes. Note it may have to run for a bit, since
_resubscribe_sleep_lock_seconds
is somewhat high.