Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport to 1.9x] Disable session affinity for internal kubernetes service #65178

Closed
wants to merge 1 commit into from
Closed

[backport to 1.9x] Disable session affinity for internal kubernetes service #65178

wants to merge 1 commit into from

Conversation

afritzler
Copy link

Under following conditions session affinity leads to a deadlock:

  • Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  • default master-count reconcilier is used
  • --apiserver-count is set to >1 according to the help message
  • number of responsive APIServers goes below apiserver-count
  • all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc. There is
always non zero chance, that given consumer is hashed to an apiserver
which is down.

Revert "give the kubernetes service client ip session affinity"
This reverts commit e21ebbc.

What this PR does / why we need it:
Backporting #56690 to 1.9 release branch.

Which issue(s) this PR fixes
Fixes #23129

Special notes for your reviewer:

Release note:

Disable session affinity for internal kubernetes service - Backport of #56690 to 1.9 release branch

Under following conditions session affinity leads to a deadlock:
  - Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  - default master-count reconcilier is used
  - --apiserver-count is set to >1 according to the help message
  - number of responsive APIServers goes below `apiserver-count`
  - all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc.  There is
always non zero chance, that given consumer is hashed  to an apiserver
which is down.

Revert "give the kubernetes service client ip session affinity"
This reverts commit e21ebbc.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 18, 2018
@k8s-ci-robot k8s-ci-robot requested review from davidopp and deads2k June 18, 2018 12:59
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 18, 2018
@afritzler
Copy link
Author

/assign @derekwaynecarr

@k8s-github-robot
Copy link

This PR is not for the master branch but does not have the cherrypick-approved label. Adding the do-not-merge/cherry-pick-not-approved label.

@k8s-github-robot k8s-github-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Jun 18, 2018
@thockin
Copy link
Member

thockin commented Jun 18, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afritzler, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2018
@BenTheElder
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 18, 2018
@mbohlool
Copy link
Contributor

why this is not an automated cherrypick of #23129?

@afritzler
Copy link
Author

/retest

@afritzler
Copy link
Author

afritzler commented Jun 20, 2018

@mbohlool is it possible to include this cherrypick into the upcoming 1.9.9 release?

@mbohlool
Copy link
Contributor

@afritzler We can try. Can you explain why this is not an automated cherrypick? This is the document you need to follow to create an automated cherrypick: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md

I am closing this one, if you believe this change cannot be an automated cherrypick of #56690 please open again with an explanation.

@mbohlool mbohlool closed this Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants