Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

CF API Server becomes unavailable during updates #636

Open
braunsonm opened this issue Mar 9, 2021 · 6 comments
Open

CF API Server becomes unavailable during updates #636

braunsonm opened this issue Mar 9, 2021 · 6 comments

Comments

@braunsonm
Copy link

Describe the bug

In a production deployment downtime of the API Server during updates is not in line with CF-for-VMs. The default should deploy more than 1 replica and do a rolling update.

Current behavior

The API Server will be taken offline to update the image.

Expected behavior

More than 1 replica to remain online during CF Updates.

Additional context

cf-for-k8s SHA

v2.1.1

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177271427

The labels on this github issue will be updated when the story is started.

@matt-royal
Copy link
Contributor

Thank you for the issue, @braunsonm. We just committed a change to the develop branch that allows you to scale up the cf-api-server via a data value (capi.cf_api_server.replicas). Once this makes it into a release, you can easily scale up to 2+ replicas and avoid this problem.

@braunsonm
Copy link
Author

@matt-royal the point of this issue was I believe this should be the default. This is a 5 cluster deployment and it is expected it should be highly available without a bunch of tweaks.

If not I'd recommend a document in the repo that tells users what steps they need to make to make it HA (external DB, external blobstore, recommended replica counts).

@Birdrock
Copy link
Member

@braunsonm I'm re-opening this for more discussion.

We've found some configuration that may alleviate the problem, but the larger discussion is around what our default deploy target is. To the present, we've been targeting small clusters or developer workstations. A truly HA configuration isn't a very good out of the box kick-the-tires solution, so we may need to make some compromise.

To that end, the result of this issue may be to open a new issue with some clarified requirements.

@Birdrock Birdrock reopened this Mar 23, 2021
@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177468264

The labels on this github issue will be updated when the story is started.

@braunsonm
Copy link
Author

I thought the deployment was targeted close to something HA with the exception of the DB and Blobstore.

If the goal is to be similar to cf-deployment on Bosh then the default deployment should be HA with batteries included. With the remove_resource_requirements for developer machines. That's the way we personally have been treating it.

When the deployment requirements are a 5 node cluster that seems to be quite a stretch if you are defaulting your target to a developer workstation. As you said, even some clarified documentation for operators running this in production would be good to have 👍 If you need any help with that based on our experience don't hesitate to reach out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants