Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rds): support rolling instance updates to reduce downtime #20054

Merged
merged 4 commits into from
Jul 12, 2022

Conversation

spanierm42
Copy link
Contributor

Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by @hixi-hyi in issue #10595.

Fixes #10595

@gitpod-io
Copy link

gitpod-io bot commented Apr 23, 2022

@github-actions github-actions bot added the p2 label Apr 23, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team April 23, 2022 22:00
@TheRealAmazonKendra TheRealAmazonKendra changed the base branch from v1-main to main June 14, 2022 18:55
Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change. For the most part it looks great, just a couple things before we can approve it. We need to have a README update for this new functionality and we also need an integration test.

Additionally, I'm wondering if there's a use case for allowing the rolling update to be more configurable rather than just one at a time. Thoughts on that?

@spanierm42
Copy link
Contributor Author

@TheRealAmazonKendra Thanks for the feedback. I indeed forgot to add both a documentation and an integration test.

Concerning more flexible update behaviors: I generally like the idea to add more flexibility. But I think we only have limited options in this case as RDS/Aurora does not support more sophisticated update behaviors as, for instance. E.g., I cannot state that at least "2 instances need to be running" or "at most 3 instances at a time can be updated". Thus, I think the option to choose an enum reflecting the most important options, BULK and ROLLING are a good start and easy to understand.
We might add something like BLUE_GREEN making sure that at most half of the instances are updated at a time thus further reducing the time to update while still making sure not all instances are getting updated. Or KEEP_AT_LEAST_TWO_INSTANCES making sure that at least two instances are always up and running while all others can be updated in a batch.
But as we lack support from CloudFormation for more sophisticated options, I would keep it simple and use enums here.

@TheRealAmazonKendra
Copy link
Contributor

@spanierm42 Sounds good. If you can just add in the tests and documentation I'll be more than happy to approve and merge this.

@spanierm42 spanierm42 force-pushed the rds-support-rolling-instance-updates branch from 5be1417 to 2535e03 Compare July 12, 2022 12:40
@mergify mergify bot dismissed TheRealAmazonKendra’s stale review July 12, 2022 12:41

Pull request has been modified.

@spanierm42 spanierm42 force-pushed the rds-support-rolling-instance-updates branch from e1a8cf7 to 50559a6 Compare July 12, 2022 17:42
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  hixi-hyi in issue aws#10595.
@spanierm42 spanierm42 force-pushed the rds-support-rolling-instance-updates branch from 50559a6 to d9b0fb0 Compare July 12, 2022 17:44
@spanierm42
Copy link
Contributor Author

@TheRealAmazonKendra I added the missing documentation and integration test. Let me know if that fits your needs. Looking forward to hearing from you :)

@TheRealAmazonKendra TheRealAmazonKendra changed the title feat(rds): Support rolling instance updates feat(rds): support rolling instance updates to reduce downtime Jul 12, 2022
Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution and for your edits on this! I hope you don't mind that I updated the README a bit.

@mergify
Copy link
Contributor

mergify bot commented Jul 12, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 3d38e60
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 86790b6 into aws:main Jul 12, 2022
@mergify
Copy link
Contributor

mergify bot commented Jul 12, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@spanierm42 spanierm42 deleted the rds-support-rolling-instance-updates branch July 13, 2022 06:55
@spanierm42
Copy link
Contributor Author

@TheRealAmazonKendra Thanks for all your guidance and feedback. I highly appreciate your change in the README.md as I learned from that you prefer concise documentation and technical details (here, the dependencies between the instances, which I wanted to hide as implementation detail in the text).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws-rds] Minimize downtime during DBCluster updates
3 participants