Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove breaks in relay loop that could cause permanent disconnection from relay #869

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

rolznz
Copy link
Contributor

@rolznz rolznz commented Dec 9, 2024

Fixes #888

@rolznz rolznz requested review from frnandu and im-adithya December 9, 2024 14:45
@rolznz rolznz changed the title fix: breaks in relay loop that could cause permanent disconnection from relay fix: remove breaks in relay loop that could cause permanent disconnection from relay Dec 9, 2024
@rolznz rolznz added this to the v1.12.0 milestone Dec 10, 2024
@@ -123,7 +120,6 @@ func (svc *service) startNostr(ctx context.Context) error {
logger.Logger.WithError(err).Error("Got an error from the relay while listening to subscription.")
continue
}
break
}
select {
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a break here (after the log) instead of making it loop through once again and then break at L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep it easy to understand by only having one exit point. The more places we can have breaks the more chance one of them doesn't work and incorrectly exits the loop.

@@ -123,7 +120,6 @@ func (svc *service) startNostr(ctx context.Context) error {
logger.Logger.WithError(err).Error("Got an error from the relay while listening to subscription.")
continue
}
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this break is here because it will be only hit when err == nil (which we assumed that it happens when ctx is done). And for that to happen, StartSubscription should return nil err, which can only happen when subscription context ends and

  • NOT because of relay connection error
  • NOT because of main context being cancelled (because otherwise breaking should be valid as the app is anyways terminated)

Is it possible? How to reproduce this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it happens when sub.Relay.Write returns an error (here) and we also have this:

logger.Logger.Error("Relay context cancelled, but no connection error...trying to reconnect")

So it should be possible. But in any case the fix makes sense as it is always good to check instead of breaking out by assuming things 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I think it would be good to log and see it happens. If we can find a way to reproduce it, then we can submit an issue or fix in go-nostr.

Copy link
Contributor

@frnandu frnandu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce this, and besides invalid filters that the relay rejects and closes the subscription but not the socket (which should not happen when using our properly configured relay) the only possible thing that might be happening is some sort of network socket read/write timeouts which causes the subscription to be closed but no connection error on the relay level. I did not reproduce such situation, so this is not 100% confirmed.
This would previously be treated as a program exit and so the break would cancel the loop.
+1 for removing them and making the reconnect loop more reliable.

@rolznz
Copy link
Contributor Author

rolznz commented Dec 16, 2024

@frnandu thanks. It definitely happens related to networking issue, more likely in self-hosted hubs.

@rolznz rolznz merged commit 2738166 into master Dec 16, 2024
9 checks passed
@rolznz rolznz deleted the fix/relay-loop-remove-breaks branch December 16, 2024 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hubs sometimes disconnect from relay and do not reconnect
3 participants