-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributors not leaving the ring on shutdown when memberlist is used #2401
Comments
Some hypothesis I made so far. [Discarded] Distributors are not configured to leave the ring on shutdownI guessed in #2154 we didn't configure the BasicLifecycler to leave the ring on shutdown, but we did: Distributors are getting killed after
|
We found the issue. It's in the memberlist library, causing our LEFT messages (used to remove an instance from the ring) to be dropped under a specific scenario. We're working on a fix. |
Fix is available in hashicorp/memberlist#263, and merged in our fork as grafana/memberlist@09ffed8. |
I noticed that when we rollout distributors, some of them are left in the ring until the auto-forget triggers (introduced in #2154). We run distributors with
-distributor.ring.heartbeat-timeout=4m
, so the auto-forget triggers after 40 minutes (10x 4m).The screenshot below shows 3 consecutive rollouts. In all cases, the actual number of distributors in the ring increase in the ring, and start decreasing after 40m the rollout has started, which is when the auto-forget triggers.
The text was updated successfully, but these errors were encountered: