Skip to content

Commit

Permalink
Merge pull request kubernetes-retired#4 in RUN/kube-aws from feature/…
Browse files Browse the repository at this point in the history
…update-to-latest-kube-aws-master to hcom-flavour

* commit '175217133f75b3c251536bc0d51ccafd2b1a5de4':
  Fix the dead-lock while bootstrapping etcd cluster when wait signal is enabled. Resolves kubernetes-retired#525
  Fix elasticFileSystemId to be propagated to node pools Resolves kubernetes-retired#487
  'Cluster-dump' feature to export Kubernetes Resources to S3
  Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml.
  Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment)
  Wait until kube-system becomes ready Resolves kubernetes-retired#467
  • Loading branch information
Maxim Ivanov committed Apr 12, 2017
2 parents 9b5053d + 1752171 commit 0211915
Show file tree
Hide file tree
Showing 17 changed files with 199 additions and 58 deletions.
51 changes: 51 additions & 0 deletions Documentation/kubernetes-on-aws-backup-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Backup

A feature to backup of Kubernetes resources can be enabled by specifying:
```
kubeResourcesAutosave:
enabled: true
```
in cluster.yaml.

When active, a kube-system Deployment schedules a single pod to take and upload snapshots of all Kubernetes resources to S3.
- Backups are taken and exported when the pod (re)starts and then continues to backup in 24 hours intervals.
- Each snapshot resides in a timestamped folder
- The resources have several fields omitted such as status , uid, etc ... this is to allow the possibility of restoring resources inside a fresh cluster
- Resources that reside within namespaces are grouped inside folders labeled with the namespace name
- Resources outside namespaces are grouped at the same directory level as the namespace folders
- The backups are exported to the S3 URI: ```s3://<your-bucket-name>/.../<your-cluster-name>/backup/*```

### Example

A Kubernetes environment has the namespaces:
- kube-system
- alpha
- beta

A backup is created on 04/05/2017 at 13:48:33.

The backup is exported to S3 to the path:
```
s3://my-bucket-name/my-cluster-name/backup/17-05-04_13-48-33
```
Inside the ```17-05-04_13-48-33``` directory are be several .json files of the Kubernetes resources that reside outside namespaces, in addition to a number of folders with names matching the namespaces:
```
17-05-04_13-48-33/kube-system
17-05-04_13-48-33/alpha
17-05-04_13-48-33/beta
17-05-04_13-48-33/persistentvolumes.json
17-05-04_13-48-33/storageclasses.json
...
...
...
```
Inside each namespace folder are be several .json files of the Kubernetes resources that reside inside the respective namespace
```
17-05-04_13-48-33/kube-system/deployments.json
17-05-04_13-48-33/kube-system/statefulsets.json
...
...
...
```
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Check out our getting started tutorial on launching your first Kubernetes cluste
* Configure various Kubernetes add-ons
* [Step 7: Destroy](/Documentation/kubernetes-on-aws-destroy.md)
* Destroy the cluster
* **Optional Features**
* [Backup Kubernetes resources](/Documentation/kubernetes-on-aws-backup-restore.md)

## Examples

Expand Down
1 change: 1 addition & 0 deletions core/controlplane/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ func (c *ClusterRef) validateExistingVPCState(ec2Svc ec2Service) error {

func NewCluster(cfg *config.Cluster, opts config.StackTemplateOptions, awsDebug bool) (*Cluster, error) {
cluster := NewClusterRef(cfg, awsDebug)
cluster.KubeResourcesAutosave.S3Path = fmt.Sprintf("%skube-aws/clusters/%s/backup", strings.TrimPrefix(opts.S3URI, "s3://"), cfg.ClusterName)
stackConfig, err := cluster.StackConfig(opts)
if err != nil {
return nil, err
Expand Down
45 changes: 38 additions & 7 deletions core/controlplane/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,9 @@ func NewDefaultCluster() *Cluster {
CreateRecordSet: false,
RecordSetTTL: 300,
CustomSettings: make(map[string]interface{}),
ExportKubeResources: false,
KubeResourcesAutosave: KubeResourcesAutosave{
Enabled: false,
},
}
}

Expand Down Expand Up @@ -639,6 +641,7 @@ type Cluster struct {
ControllerSettings `yaml:",inline"`
EtcdSettings `yaml:",inline"`
FlannelSettings `yaml:",inline"`
AdminAPIEndpointName string `yaml:"adminAPIEndpointName,omitempty"`
ServiceCIDR string `yaml:"serviceCIDR,omitempty"`
APIServerServiceIP string `yaml:"-"`
CreateRecordSet bool `yaml:"createRecordSet,omitempty"`
Expand All @@ -648,8 +651,7 @@ type Cluster struct {
HostedZoneID string `yaml:"hostedZoneId,omitempty"`
ProvidedEncryptService EncryptService
CustomSettings map[string]interface{} `yaml:"customSettings,omitempty"`
ExportKubeResources bool `yaml:"exportKubeResources,omitempty"`
KubeResourcesS3Path string `yaml:"-"`
KubeResourcesAutosave `yaml:"kubeResourcesAutosave,omitempty"`
}

type Experimental struct {
Expand Down Expand Up @@ -724,6 +726,11 @@ type Kube2IamSupport struct {
Enabled bool `yaml:"enabled"`
}

type KubeResourcesAutosave struct {
Enabled bool `yaml:"enabled"`
S3Path string
}

type NodeDrainer struct {
Enabled bool `yaml:"enabled"`
}
Expand Down Expand Up @@ -832,9 +839,9 @@ func (c ControllerSettings) ControllerRollingUpdateMinInstancesInService() int {
return *c.AutoScalingGroup.RollingUpdateMinInstancesInService
}

// Required by kubelet to locate the apiserver
func (c KubeClusterSettings) APIServerEndpoint() string {
return fmt.Sprintf("https://%s", c.ExternalDNSName)
// AdminAPIEndpointURL is the url of the API endpoint which is written in kubeconfig and used to by admins
func (c *Config) AdminAPIEndpointURL() string {
return fmt.Sprintf("https://%s", c.AdminAPIEndpoint.DNSName)
}

// Required by kubelet to use the consistent network plugin with the base cluster
Expand Down Expand Up @@ -890,6 +897,29 @@ func (c Cluster) Config() (*Config, error) {

config.APIEndpoints = apiEndpoints

apiEndpointNames := []string{}
for _, e := range apiEndpoints {
apiEndpointNames = append(apiEndpointNames, e.Name)
}

var adminAPIEndpoint derived.APIEndpoint
if c.AdminAPIEndpointName != "" {
found, err := apiEndpoints.FindByName(c.AdminAPIEndpointName)
if err != nil {
return nil, fmt.Errorf("failed to find an API endpoint named \"%s\": %v", c.AdminAPIEndpointName, err)
}
adminAPIEndpoint = *found
} else {
if len(apiEndpoints) > 1 {
return nil, fmt.Errorf(
"adminAPIEndpointName must not be empty when there's 2 or more api endpoints under the key `apiEndpoints`. Specify one of: %s",
strings.Join(apiEndpointNames, ", "),
)
}
adminAPIEndpoint = apiEndpoints.GetDefault()
}
config.AdminAPIEndpoint = adminAPIEndpoint

return &config, nil
}

Expand Down Expand Up @@ -1030,7 +1060,8 @@ func (c Cluster) StackConfig(opts StackTemplateOptions) (*StackConfig, error) {
type Config struct {
Cluster

APIEndpoints derived.APIEndpoints
AdminAPIEndpoint derived.APIEndpoint
APIEndpoints derived.APIEndpoints

EtcdNodes []derived.EtcdNode

Expand Down
48 changes: 25 additions & 23 deletions core/controlplane/config/templates/cloud-config-controller
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,11 @@ write_files:
/usr/bin/docker run --rm --net=host -v /srv/kubernetes:/srv/kubernetes {{.HyperkubeImage.RepoWithTag}} /hyperkube kubectl "$@"
}

while ! kubectl get ns kube-system; do
echo Waiting until kube-system created.
sleep 3
done

mfdir=/srv/kubernetes/manifests

{{ if .UseCalico }}
Expand All @@ -508,7 +513,7 @@ write_files:
{{ end }}

# Deployments
for manifest in {kube-dns-de,kube-dns-autoscaler-de,heapster-de{{ if .ExportKubeResources }},cluster-dump{{ end }}}.yaml; do
for manifest in {kube-dns-de,kube-dns-autoscaler-de,heapster-de{{ if .KubeResourcesAutosave.Enabled }},kube-resources-autosave{{ end }}}.yaml; do
kubectl apply -f "${mfdir}/$manifest"
done

Expand Down Expand Up @@ -786,40 +791,40 @@ write_files:
sed -i -e "s#\$ETCDKEY#$etcd_key#g" /srv/kubernetes/manifests/calico.yaml

{{ end }}
{{ if .ExportKubeResources }}
- path: /srv/kubernetes/manifests/cluster-dump.yaml
{{ if .KubeResourcesAutosave.Enabled }}
- path: /srv/kubernetes/manifests/kube-resources-autosave.yaml
content: |
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cluster-dump
name: kube-resources-autosave
namespace: kube-system
labels:
k8s-app: cluster-dump-policy
k8s-app: kube-resources-autosave-policy
spec:
replicas: 1
template:
metadata:
name: cluster-dump
name: kube-resources-autosave
namespace: kube-system
labels:
k8s-app: cluster-dump-policy
k8s-app: kube-resources-autosave-policy
spec:
containers:
- name: cluster-dump-dumper
- name: kube-resources-autosave-dumper
image: {{.HyperkubeImage.RepoWithTag}}
command: ["/bin/bash", "-c" ]
args:
- |
set -x ;
DUMP_DIR_COMPLETE=/cluster-dump/complete ;
DUMP_DIR_COMPLETE=/kube-resources-autosave/complete ;
mkdir -p ${DUMP_DIR_COMPLETE} ;
while true; do
TIMESTAMP=$(date +%Y-%m-%d_%H-%M-%S)
DUMP_DIR=/cluster-dump/tmp/${TIMESTAMP} ;
DUMP_DIR=/kube-resources-autosave/tmp/${TIMESTAMP} ;
mkdir -p ${DUMP_DIR} ;
RESOURCES_OUT_NAMESPACE=( namespaces persistentvolumes nodes storageclasses ) ;
RESOURCES_OUT_NAMESPACE=( namespaces persistentvolumes nodes storageclasses clusterrolebindings clusterroles ) ;
for r in ${RESOURCES_OUT_NAMESPACE[@]};do
echo " Searching for resources: ${r}" ;
/kubectl get --export -o=json ${r} | \
Expand All @@ -835,7 +840,8 @@ write_files:
done ;
RESOURCES_IN_NAMESPACE=( componentstatuses configmaps daemonsets deployments endpoints events horizontalpodautoscalers
ingresses jobs limitranges networkpolicies persistentvolumeclaims pods podsecuritypolicies podtemplates replicasets
replicationcontrollers resourcequotas secrets serviceaccounts services statefulsets thirdpartyresources ) ;
replicationcontrollers resourcequotas secrets serviceaccounts services statefulsets thirdpartyresources
poddisruptionbudgets roles rolebindings) ;
for ns in $(jq -r '.metadata.name' < ${DUMP_DIR}/namespaces.json);do
echo "Searching in namespace: ${ns}" ;
mkdir -p ${DUMP_DIR}/${ns} ;
Expand All @@ -852,11 +858,7 @@ write_files:
.metadata.creationTimestamp,
.metadata.generation,
.metadata.annotations."pv.kubernetes.io/bind-completed",
.status,
.spec.template.spec.securityContext,
.spec.template.spec.dnsPolicy,
.spec.template.spec.terminationGracePeriodSeconds,
.spec.template.spec.restartPolicy
.status
)' > ${DUMP_DIR}/${ns}/${r}.json && touch /probe-token ;
done ;
done ;
Expand All @@ -871,18 +873,18 @@ write_files:
periodSeconds: 10
volumeMounts:
- name: dump-dir
mountPath: /cluster-dump
mountPath: /kube-resources-autosave
readOnly: false
- name: cluster-dump-pusher
image: {{.AWSCliImageRepo}}:{{.AWSCliTag}}
- name: kube-resources-autosave-pusher
image: {{.AWSCliImage.RepoWithTag}}
command: ["/bin/bash", "-c" ]
args:
- |
set -x ;
DUMP_DIR_COMPLETE=/cluster-dump/complete ;
DUMP_DIR_COMPLETE=/kube-resources-autosave/complete ;
while true; do
for FILE in ${DUMP_DIR_COMPLETE}/* ; do
aws s3 mv ${FILE} s3://{{ .KubeResourcesS3Path }}/$(basename ${FILE}) --recursive && rm -r -f ${FILE} && touch /probe-token ;
aws s3 mv ${FILE} s3://{{ .KubeResourcesAutosave.S3Path }}/$(basename ${FILE}) --recursive && rm -r -f ${FILE} && touch /probe-token ;
done ;
sleep 1m ;
done
Expand All @@ -893,7 +895,7 @@ write_files:
periodSeconds: 10
volumeMounts:
- name: dump-dir
mountPath: /cluster-dump
mountPath: /kube-resources-autosave
readOnly: false
volumes:
- name: dump-dir
Expand Down
2 changes: 1 addition & 1 deletion core/controlplane/config/templates/cloud-config-etcd
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ coreos:
Wants=decrypt-assets.service
After=decrypt-assets.service
{{- end}}
{{if .Etcd.DisasterRecovery.Automated -}}
{{if .Etcd.DisasterRecovery.SupportsEtcdVersion .Etcd.Version -}}
{{/* can be `Wants` if you like etcd-member to not stop when etcdadm-reconfigure failed */}}
BindsTo=etcdadm-reconfigure.service etcdadm-update-status.service
After=etcdadm-reconfigure.service
Expand Down
12 changes: 6 additions & 6 deletions core/controlplane/config/templates/cloud-config-worker
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ coreos:
ExecStartPre=/usr/bin/docker run --rm -e SLEEP=false -v /opt/cni/bin:/host/opt/cni/bin {{ .CalicoCniImage.RepoWithTag }} /install-cni.sh
{{end -}}
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--api-servers={{.APIServerEndpoint}} \
--api-servers={{.APIEndpointURL}} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
{{/* Work-around until https://github.com/kubernetes/kubernetes/issues/43967 is fixed via https://github.com/kubernetes/kubernetes/pull/43995 */ -}}
--cni-bin-dir=/opt/cni/bin \
Expand Down Expand Up @@ -245,7 +245,7 @@ coreos:
--net=host \
{{.HyperkubeImage.RepoWithTag}} \
--exec=/kubectl -- \
--server=https://{{.APIEndpoint.DNSName}}:443 \
--server={{.APIEndpointURL}}:443 \
--kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
drain $(hostname) \
--ignore-daemonsets \
Expand Down Expand Up @@ -408,7 +408,7 @@ coreos:
-e LAUNCHCONFIGURATION=${LAUNCHCONFIGURATION} \
{{.HyperkubeImage.RepoWithTag}} /bin/bash \
-ec 'echo "placing labels and annotations with additional AWS parameters."; \
kctl="/kubectl --server=https://{{.APIEndpoint.DNSName}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"; \
kctl="/kubectl --server={{.APIEndpointURL}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"; \
kctl_label="$kctl label --overwrite nodes/$(hostname)"; \
kctl_annotate="$kctl annotate --overwrite nodes/$(hostname)"; \
$kctl_label kube-aws.coreos.com/autoscalinggroup=${AUTOSCALINGGROUP}; \
Expand Down Expand Up @@ -710,7 +710,7 @@ write_files:
'echo "tainting this node."
hostname="'${hostname}'"
taints=({{range $i, $taint := .Experimental.Taints}}"{{$taint.String}}" {{end}})
kubectl="/kubectl --server=https://{{.APIEndpoint.DNSName}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"
kubectl="/kubectl --server={{.APIEndpointURL}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"
taint="$kubectl taint node --overwrite"
for t in ${taints[@]}; do
$taint "$hostname" "$t"
Expand All @@ -737,7 +737,7 @@ write_files:
command:
- /hyperkube
- proxy
- --master={{.APIServerEndpoint}}
- --master={{.APIEndpointURL}}
- --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
securityContext:
privileged: true
Expand Down Expand Up @@ -830,7 +830,7 @@ write_files:
"log_level": "info",
"policy": {
"type": "k8s",
"k8s_api_root": "https://{{.APIEndpoint.DNSName}}/api/v1/",
"k8s_api_root": "{{.APIEndpointURL}}/api/v1/",
"k8s_client_key": "/etc/kubernetes/ssl/worker-key.pem",
"k8s_client_certificate": "/etc/kubernetes/ssl/worker.pem",
"k8s_certificate_authority": "/etc/kubernetes/ssl/ca.pem"
Expand Down
10 changes: 10 additions & 0 deletions core/controlplane/config/templates/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ externalDNSName: {{.ExternalDNSName}}
# Either specify hostedZoneId or hostedZone, but not both
#hostedZoneId: ""

# The name of one of API endpoints defined in `apiEndpoints` below to be written in kubeconfig and then used by admins
# to access k8s API from their laptops, CI servers, or etc.
# Required if there are 2 or more API endpoints defined in `apiEndpoints`
#adminAPIEndpointName: versionedPublic

# Kubernetes API endpoints with each one has a DNS name and is with/without a managed/unmanaged ELB, Route 53 record set
# CAUTION: `externalDNSName` must be omitted when there are one or more items under `apiEndpoints`
#apiEndpoints:
Expand Down Expand Up @@ -933,6 +938,11 @@ worker:
# enabled: true
# maxBatchSize: 1

# Exports all Kubernetes resources (in .json format) to a bucket 's3://S3URI/clusterBackup/*'.
# The export process executes on start-up and repeats every 24 hours.
#kubeResourcesAutosave:
# enabled: false

# Addon features
addons:
# When enabled, Kubernetes rescheduler is deployed to the cluster controller(s)
Expand Down
2 changes: 1 addition & 1 deletion core/controlplane/config/templates/kubeconfig.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: Config
clusters:
- cluster:
certificate-authority: credentials/ca.pem
server: {{ .APIServerEndpoint }}
server: {{ .AdminAPIEndpointURL }}
name: kube-aws-{{ .ClusterName }}-cluster
contexts:
- context:
Expand Down
Loading

0 comments on commit 0211915

Please sign in to comment.