-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop worker when ctx is canceled during recovered round execution #401
Stop worker when ctx is canceled during recovered round execution #401
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #401 +/- ##
=========================================
- Coverage 74.8% 74.6% -0.2%
=========================================
Files 23 23
Lines 1898 1904 +6
=========================================
+ Hits 1421 1422 +1
- Misses 359 363 +4
- Partials 118 119 +1
|
I don't understand what the issue is and how this PR fixes it. Could you elaborate? |
Sorry for this. Updated the PR description. |
eg.Go(func() error { | ||
if err := round.Teardown(ctx, err == nil); err != nil { | ||
logger.Warn("round teardown failed", zap.Error(err), zap.Uint("epoch", round.epoch)) | ||
} | ||
return nil | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this a method? Especially since further below the same code is called again (without the cleanupRound
shell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, the method would need to return:
func() error {
if err := round.Teardown(ctx, err == nil); err != nil {
logger.Warn("round teardown failed", zap.Error(err), zap.Uint("epoch", round.epoch))
}
return nil
}
}
to be scheduled on the local eg
.
Closes #400
Root cause
When the poet was shut down during the recovered round execution, it would not immediately quit but proceed to select from the closed rounds channel and
ctx.Done()
. It would have a 50% chance to pick up the same round ID again and start it from the start because the registration also recovers a round in-progress and sends it over the closed channel. In a normal use case (when the recovered round finishes) the round recovered from registration is skipped.The fix
The fix is to quit the
Service::loop()
immediately if the recovered round either fails or is canceled.