Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade for AKS Cluster: Can't drain because Too Many Requests #1457

Closed
DaleyKD opened this issue Aug 26, 2022 · 3 comments
Closed

Upgrade for AKS Cluster: Can't drain because Too Many Requests #1457

DaleyKD opened this issue Aug 26, 2022 · 3 comments
Labels
type/bug Something isn't working

Comments

@DaleyKD
Copy link

DaleyKD commented Aug 26, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

Currently trying to upgrade my AKS cluster from 1.23.8 to 1.24.0. Two of my nodes aren't able to drain. Please note: the other nodes with pods who have PDBs were able to upgrade successfully. Could it be that the Max Unavailable should rarely? be less than 1 and default to that?

#1278 says that the spec.maxUnavailable is (n/2) - 1, and I've seen somewhere else in these issues that the main recommended number of replicas for consul-connect-injector is 2, which means spec.maxUnavailable will be 0.

I am almost completely new to K8S and especially PDBs, so I don't know what I'm talking about.

kyle@Azure:~$ kubectl get events --sort-by='{.lastTimestamp}'
2m50s       Warning   Drain                     node/aks-default-18345084-vmss000000   Eviction blocked by Too many Requests (usually a pdb): [consul-connect-injector-8df4c6-d2flg]

Reproduction Steps

Used Terraform to install this:

resource "helm_release" "consul" {
  name       = "consul"
  repository = "https://helm.releases.hashicorp.com"
  chart      = "consul"
  version    = "0.46.1"
  namespace  = kubernetes_namespace.consul.metadata.0.name
  values = [
    data.hcs_agent_helm_config.hcs.config # Get the consul config from our HCS cluster
  ]

  set {
    name  = "controller.enabled"
    value = "true"
  }

  set {
    name  = "connectInject.transparentProxy.defaultEnabled"
    value = "false"
  }
}

Expected behavior

Ideally, a problem-free AKS cluster upgrade.

Environment details

  • consul-k8s version: 0.46.1/1.12.3
  • HCS consul version: 1.11.6
  • Kubernetes version: v1.23.8 --> 1.24.0
  • Cloud Provider: AKS
  • Networking CNI plugin: kubenet
kyle@Azure:~$ helm get values consul -n consulns
USER-SUPPLIED VALUES:
client:
  enabled: true
  exposeGossipPorts: true
  join:
  - {uuid}.private.consul.{uuid}.az.hashicorp.cloud
connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: false
controller:
  enabled: true
externalServers:
  enabled: true
  hosts:
  - {uuid}.private.consul.{uuid}.az.hashicorp.cloud
  httpsPort: 443
  k8sAuthMethodHost: https://my-cluster-bdccbc13.hcp.centralus.azmk8s.io:443
  useSystemRoots: true
global:
  acls:
    bootstrapToken:
      secretKey: token
      secretName: my-hcs-cluster-bootstrap-token
    manageSystemACLs: true
  datacenter: dc1
  enabled: false
  gossipEncryption:
    secretKey: gossipEncryptionKey
    secretName: my-hcs-cluster-hcs
  name: consul
  tls:
    caCert:
      secretKey: caCert
      secretName: my-hcs-cluster-hcs
    enableAutoEncrypt: true
    enabled: true

Additional Context

kyle@Azure:~$ kubectl get pdb -A
NAMESPACE     NAME                                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
consulns      consul-connect-injector                 N/A             0                 0                     19h
ingress       nginxingress-ingress-nginx-controller   1               N/A               1                     19h
kube-system   coredns-pdb                             1               N/A               1                     19h
kube-system   konnectivity-agent                      1               N/A               1                     19h
kube-system   metrics-server-pdb                      1               N/A               1                     19h
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consulns -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    meta.helm.sh/release-name: consul
    meta.helm.sh/release-namespace: consulns
  creationTimestamp: "2022-08-25T20:29:53Z"
  generation: 1
  labels:
    app: consul
    app.kubernetes.io/managed-by: Helm
    chart: consul-helm
    component: connect-injector
    heritage: Helm
    release: consul
  name: consul-connect-injector
  namespace: consulns
  resourceVersion: "509006"
  uid: 617192e1-9e3c-498e-9987-bea59a05e11b
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: consul
      component: connect-injector
      release: consul
status:
  conditions:
  - lastTransitionTime: "2022-08-26T15:32:54Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 2
  desiredHealthy: 2
  disruptionsAllowed: 0
  expectedPods: 2
  observedGeneration: 1
@DaleyKD DaleyKD added the type/bug Something isn't working label Aug 26, 2022
@DaleyKD
Copy link
Author

DaleyKD commented Sep 6, 2022

I'm having this problem again/still. I even tried setting connectInject.disruptionBudget.maxUnavailable to 1, but that doesn't appear to be set in the PDB spec.

I have to delete the PDB to upgrade my AKS cluster.

@david-yu
Copy link
Contributor

Hi @DaleyKD this is likely addressed by https://github.com/hashicorp/consul-k8s/pull/1530/files. Will close this issue, and we should have that addressed in 0.49.0 which should happen sometime later this week or early next week.

@DaleyKD
Copy link
Author

DaleyKD commented Nov 9, 2023

@david-yu ,

Would you consider reopening this?

I'm currently trying to upgrade AKS from 1.25.6 to 1.26.6.

Before upgrading, I upgraded consul-k8s all the way from 0.49.0 to 0.49.8, then to 1.0.10, then to 1.2.3. I am currently running 1.2.3 which is Consul 1.16.3.

It seems that nothing with the disruptionBudget changed for connect inject.

kyle@Azure:~$ helm get values consul -n consul
USER-SUPPLIED VALUES:
connectInject:
  transparentProxy:
    defaultEnabled: false
dns:
  enabled: false
global:
  acls:
    manageSystemACLs: true
  datacenter: stratusdevdc1
  gossipEncryption:
    autoGenerate: true
  name: consul
  tls:
    enableAutoEncrypt: true
    enabled: true
server:
  disruptionBudget:
    enabled: false
  replicas: 1
kyle@Azure:~$ kubectl get pdb -A
NAMESPACE     NAME                                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
consul        consul-connect-injector                 N/A             0                 0                     13h
ingress       nginxingress-ingress-nginx-controller   1               N/A               1                     420d
kube-system   coredns-pdb                             1               N/A               1                     420d
kube-system   konnectivity-agent                      1               N/A               1                     420d
kube-system   metrics-server-pdb                      1               N/A               1                     420d
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consul -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    meta.helm.sh/release-name: consul
    meta.helm.sh/release-namespace: consul
  creationTimestamp: "2023-11-09T01:32:04Z"
  generation: 1
  labels:
    app: consul
    app.kubernetes.io/managed-by: Helm
    chart: consul-helm
    component: connect-injector
    heritage: Helm
    release: consul
  name: consul-connect-injector
  namespace: consul
  resourceVersion: "257320864"
  uid: 81f1e341-e87c-44f9-9faa-49375b0299e9
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: consul
      component: connect-injector
      release: consul
status:
  conditions:
  - lastTransitionTime: "2023-11-09T02:05:35Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1

I have a hard time believe that, if I'm doing it correctly, I'm the only one who can't ever upgrade AKS. I suspect I'm missing something obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants