Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure we reconnect on failure #173

Merged
merged 3 commits into from
May 10, 2024
Merged

ensure we reconnect on failure #173

merged 3 commits into from
May 10, 2024

Conversation

xlc
Copy link
Member

@xlc xlc commented May 10, 2024

No description provided.

};

let mut selected_endpoint = healthiest_endpoint(None).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is important. it ensures at least one endpoint is connected. selecting just the first one may result on endpoint connection failure and never connects so selected_endpoint.connected().await will never resolve

Copy link
Member Author

@xlc xlc May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case we need a test. the current behaviour makes unit test non-deterministic as it may connect any of the dummy server so it is best to fix the waiting for connect behaviour anyway

@@ -38,19 +40,23 @@ impl Endpoint {
health_config: HealthCheckConfig,
) -> Self {
let (client_tx, client_rx) = tokio::sync::watch::channel(None);
let (reconnect_tx, mut reconnect_rx) = tokio::sync::mpsc::channel(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio::sync::Notify may be a better option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify is one off thing but we may need to reconnect multiple times

@@ -422,6 +426,10 @@ impl Client {
_ = selected_endpoint.health().unhealthy() => {
// Current selected endpoint is unhealthy, try to rotate to another one.
// In case of all endpoints are unhealthy, we don't want to keep rotating but stick with the healthiest one.

// The ws client maybe in a state that requires a reconnect
selected_endpoint.reconnect().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will execute the moment endpoint becomes unhealthy and when that happens it will try to reconnect. I don't think this extra reconnect will help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is not reconnect currently. we have to drop and re-create the client to actually reconnect. currently it will always fail if the remote drops connection and can never be able to connect to it anymore


let h1 = tokio::spawn(async move {
let _req = rx1.recv().await.unwrap();
// no response, let it timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a request timeout will make endpoint unhealthy therefor it will try to reconnect itself

@xlc xlc merged commit e61fa69 into master May 10, 2024
1 check passed
@xlc xlc deleted the fix-reconnect branch May 10, 2024 11:10
xlc added a commit that referenced this pull request May 10, 2024
* ensure we reconnect on failure

* refactor

* fix test
xlc added a commit that referenced this pull request May 18, 2024
xlc added a commit that referenced this pull request May 18, 2024
* Revert "Refactor endpoint (#178)"

This reverts commit 7fa3132.

* Revert "ensure we reconnect on failure (#173)"

This reverts commit 5039cfa.

* Revert "improve reconnect wait time (#168)"

This reverts commit 7cb7c73.

* Revert "Await healthy endpoint (#158)"

This reverts commit ef1c524.

* Revert "endpoint health (#152)"

This reverts commit cdbdd9b.

* redo validate middleware

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants