Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-create missing pod which has incompleted downscale operation #824

Merged
merged 20 commits into from
Jul 6, 2022

Conversation

bartam1
Copy link
Contributor

@bartam1 bartam1 commented Jun 17, 2022

Q A
Bug fix? no
New feature? yes
API breaks? no
Deprecations? no
Related tickets fixes #776
License Apache 2.0

What's in this PR?

There is a new field in the broker status. Its name is configurationBackup.
It stores broker configuration in gzipped base64 format.
When a broker is missing and has not been removed gracefully (incompleted downscale operation), the operator will re-create that pod based on the configurationBackup thus graceful downscale operation can be continued.

Why?

During a scale-down operation if a broker that is to be removed is deleted in a non controlled fashion (while the data is being drained from the Kafka broker to other brokers) the operator doesn't bring the broker back up to finish this and then remove it again.
This in turn will slow down new replicas getting in sync (as they will have to pull from 2 or even worse, 1 in sync replica).
Even more concerning, if there is a K8s cluster rollout restart (due to VM upgrades for examples) if the Kafka pods don't come back up, this will result in offline partitions.

Checklist

  • Implementation tested
  • Error handling code meets the guideline
  • Logging code meets the guideline
  • User guide and development docs updated (if needed)

@bartam1 bartam1 requested a review from a team as a code owner June 17, 2022 14:52
Kuvesz
Kuvesz previously requested changes Jun 20, 2022
Copy link
Contributor

@Kuvesz Kuvesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just requested some small changes, otherwise seems fine.

@ecojan
Copy link
Contributor

ecojan commented Jun 27, 2022

@bartam1 managed to test this locally today, working as expected now 👍 thank you for the effort put in here!

pregnor
pregnor previously approved these changes Jun 28, 2022
@bartam1 bartam1 dismissed stale reviews from stoader and Kuvesz June 28, 2022 18:31

Fixed

pregnor
pregnor previously approved these changes Jun 29, 2022
Copy link
Member

@pregnor pregnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pregnor
pregnor previously approved these changes Jun 29, 2022
pregnor
pregnor previously approved these changes Jun 29, 2022
stoader
stoader previously approved these changes Jun 30, 2022
@bartam1 bartam1 requested review from Kuvesz and pregnor June 30, 2022 14:16
Copy link
Member

@pregnor pregnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stoader stoader merged commit 1975aeb into master Jul 6, 2022
@stoader stoader deleted the recreatedownscale branch July 6, 2022 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kafka broker not coming back during scale-down
6 participants