Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest/ledgerbackend: Fix flaky Captive Core tests #3213

Merged
merged 8 commits into from
Nov 16, 2020

Conversation

bartekn
Copy link
Contributor

@bartekn bartekn commented Nov 10, 2020

This commit fixes two issues that cause Captive Core tests flaky:

  • It was possible that bufferedLedgerMetaReader go routine was still running after calling CaptiveStellarCore.Close() because it returned only when getting an exit signal from stellarCoreRunner (could happen after CaptiveStellarCore.Close()). To fix it a new WaitForClose() method was created that blocks until go routine returns.
  • bufferedLedgerMetaReader.readLedgerMetaFromPipe() was retuning error on Stellar-Core process graceful exit. Now it returns an error only when Stellar-Core was closed with an error.

I'm not very satisfied with a solutions here. I think we should research #3200 and automate this.

@bartekn bartekn requested a review from a team November 10, 2020 23:52
@cla-bot cla-bot bot added the cla: yes label Nov 10, 2020
if processErr != nil {
return errors.Wrap(processErr, "stellar-core process exited with an error")
}
return errors.New("stellar-core process exited unexpectedly without an error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of an offline replay, after streaming all the ledgers in the catchup range, we expect stellar-core to exit without error, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I fixed it in 873797e and I was able to simplify GetLedger: it no longer needs to handle getProcessExitChan() because if Stellar-Core exits, bufferedLedgerMetaReader will send an error to the channel.

loop:
for {
select {
case <-b.c:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this function be simplified to just <-b.closed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no and the reason is in the comment above:

	// If buffer is full, keep reading to make sure it receives
	// a shutdown signal from stellarCoreRunner.

To make it more clear start go routine closes b.close only on error or after receiving process exit signal from Stellar-Core. However, the channel is buffered (up to 20 messages now) so in case it's full we need to read one message to unblock it so start can exit.

@@ -282,7 +282,6 @@ func (r *stellarCoreRunner) close() error {
if r.started {
close(r.shutdown)
r.wg.Wait()
close(r.processExit)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing close of closed channel error because r.processExit is closed right after cmd.Wait().

@bartekn bartekn merged commit f9ab742 into stellar:master Nov 16, 2020
@bartekn bartekn deleted the fix-buffered-meta-pipe-reader-flacky branch November 16, 2020 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants