-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest/ledgerbackend: Fix flaky Captive Core tests #3213
ingest/ledgerbackend: Fix flaky Captive Core tests #3213
Conversation
if processErr != nil { | ||
return errors.Wrap(processErr, "stellar-core process exited with an error") | ||
} | ||
return errors.New("stellar-core process exited unexpectedly without an error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of an offline replay, after streaming all the ledgers in the catchup range, we expect stellar-core to exit without error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I fixed it in 873797e and I was able to simplify GetLedger
: it no longer needs to handle getProcessExitChan()
because if Stellar-Core exits, bufferedLedgerMetaReader
will send an error to the channel.
…tekn/go into fix-buffered-meta-pipe-reader-flacky
…tekn/go into fix-buffered-meta-pipe-reader-flacky
loop: | ||
for { | ||
select { | ||
case <-b.c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this function be simplified to just <-b.closed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no and the reason is in the comment above:
// If buffer is full, keep reading to make sure it receives
// a shutdown signal from stellarCoreRunner.
To make it more clear start
go routine closes b.close
only on error or after receiving process exit signal from Stellar-Core. However, the channel is buffered (up to 20 messages now) so in case it's full we need to read one message to unblock it so start
can exit.
@@ -282,7 +282,6 @@ func (r *stellarCoreRunner) close() error { | |||
if r.started { | |||
close(r.shutdown) | |||
r.wg.Wait() | |||
close(r.processExit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing close of closed channel
error because r.processExit
is closed right after cmd.Wait()
.
This commit fixes two issues that cause Captive Core tests flaky:
bufferedLedgerMetaReader
go routine was still running after callingCaptiveStellarCore.Close()
because it returned only when getting an exit signal fromstellarCoreRunner
(could happen afterCaptiveStellarCore.Close()
). To fix it a newWaitForClose()
method was created that blocks until go routine returns.bufferedLedgerMetaReader.readLedgerMetaFromPipe()
was retuning error on Stellar-Core process graceful exit. Now it returns an error only when Stellar-Core was closed with an error.I'm not very satisfied with a solutions here. I think we should research #3200 and automate this.