-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-create missing pod which has incompleted downscale operation #824
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just requested some small changes, otherwise seems fine.
@bartam1 managed to test this locally today, working as expected now 👍 thank you for the effort put in here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What's in this PR?
There is a new field in the broker status. Its name is configurationBackup.
It stores broker configuration in gzipped base64 format.
When a broker is missing and has not been removed gracefully (incompleted downscale operation), the operator will re-create that pod based on the configurationBackup thus graceful downscale operation can be continued.
Why?
During a scale-down operation if a broker that is to be removed is deleted in a non controlled fashion (while the data is being drained from the Kafka broker to other brokers) the operator doesn't bring the broker back up to finish this and then remove it again.
This in turn will slow down new replicas getting in sync (as they will have to pull from 2 or even worse, 1 in sync replica).
Even more concerning, if there is a K8s cluster rollout restart (due to VM upgrades for examples) if the Kafka pods don't come back up, this will result in offline partitions.
Checklist