Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

kube-aws: Drain nodes before shutting them down #465

Closed
wants to merge 13 commits into from

Commits on Aug 8, 2016

  1. WIP discrete etcd cluster

    colhom committed Aug 8, 2016
    Configuration menu
    Copy the full SHA
    2f547b6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d4bd382 View commit details
    Browse the repository at this point in the history
  3. fixup! sed s/kubelet/etcd2

    colhom committed Aug 8, 2016
    Configuration menu
    Copy the full SHA
    dde38f6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    81a2729 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    56a77fe View commit details
    Browse the repository at this point in the history
  6. (WIP) HA control plane.

    colhom committed Aug 8, 2016
    Configuration menu
    Copy the full SHA
    5296214 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    a84a6f4 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    65bd4fe View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2016

  1. aws: cluster upgrade support via kube-aws up --update

    Does rolling replacement update on controller ASG, followed by workers
    
    Punts on upgrading etcd cluster- simply makes sure resource definitions don't change after create.
    colhom committed Aug 9, 2016
    Configuration menu
    Copy the full SHA
    a3c772a View commit details
    Browse the repository at this point in the history
  2. aws: cluster upgrade docs

    colhom committed Aug 9, 2016
    Configuration menu
    Copy the full SHA
    3dff28f View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2016

  1. kube-aws: Drain nodes before shutting them down to give running pods …

    …time to gracefully stop.
    
    This change basically achieves it by running `docker run IMAGE kubectl drain THE_NODE --force` on the to-be-shut-down node before the kubelet gets SIGTERM in a CoreOS' shutdown process.
    
    Without this change, kubelets getting SIGTERM without a prior `drain` results in unfunctional(actually not schedulable) pods to have statuses `Ready`.
    
    With this change, when an ASG's desired cap decreased, a node's status changes over time as follows:
    
    On desired cap change:
      STATUS=Ready
    On shut-down started:
      STATUS=Ready,SchedulingDisabled (<- Pods are stopped and status is changed by `kubectl drain`)
    On shut-down finished:
      Status=NotReady,SchedulingDisabled (<- It's `NotReady` but it won't result in a down-time because we already stopped both further scheduling and pods)
    After a minute:
      The node disappears from the output of `kubectl get nodes`
    
    Note that:
    
    * This applies to manual shutdowns(via running `sudo systemctl shutdown` for example) and automated shutdowns(triggered by AWS AutoScaling when nodes get rotated out of a group)
    * We currently depend on the community docker image `mumoshu/kubectl` because `kubectl` included in the official `coreos/hyperkube` image doesn't work due to the issue kubernetes/kubernetes#24088 in Kubernetes. Once the issue is fixed and the CoreOS team published the new hyperkube image with the updated Kubernetes, we can remove that dependency.
    * The author considers this an experimental feature. So you shouldn't expect configuration API regarding this stable. It may change in the future.
    
    Re-format code introduced in the previous with gofmt to conform the rules and make the build pases
    mumoshu committed Aug 11, 2016
    Configuration menu
    Copy the full SHA
    24c650b View commit details
    Browse the repository at this point in the history
  2. fix kube-node-drainer not to fail when draining a node running Daemon…

    …Set-managed pods
    
    Before this change, you could reproduce the failure running `sudo sytemctl stop kube-node-drainer` which showed a error message indicating this issue like:
    
    ```
    May 09 05:04:08 ip-10-0-0-234.ap-northeast-1.compute.internal sh[15376]: error: DaemonSet-managed pods: cassandra-e3tdb (use --ignore-daemonsets to ignore)
    May 09 05:04:08 ip-10-0-0-234.ap-northeast-1.compute.internal systemd[1]: kube-node-drainer.service: Control process exited, code=exited status=1
    ```
    
    After this change, you can drain a node even if it was running DaemonSet-managed pods:
    
    ```
    May 09 10:41:08 ip-10-0-0-202.ap-northeast-1.compute.internal sh[7671]: node "ip-10-0-0-202.ap-northeast-1.compute.internal" cordoned
    May 09 10:41:08 ip-10-0-0-202.ap-northeast-1.compute.internal sh[7671]: WARNING: Skipping DaemonSet-managed pods: cassandra-jzfqo
    May 09 10:41:08 ip-10-0-0-202.ap-northeast-1.compute.internal sh[7671]: node "ip-10-0-0-202.ap-northeast-1.compute.internal" drained
    ```
    cw-kuoka authored and mumoshu committed Aug 11, 2016
    Configuration menu
    Copy the full SHA
    d2a9085 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8f90853 View commit details
    Browse the repository at this point in the history