Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Commit

Permalink
Fix self-hosted calico + cni in k8s 1.6
Browse files Browse the repository at this point in the history
Closes #494

Note: without changing the way we run `install-cni` containers of `calico-node` pods, E2E fails due to some pods stall in CrashLoopback:

```
+ kubectl get po --all-namespaces
NAMESPACE     NAME                                                                   READY     STATUS             RESTARTS   AGE
kube-system   calico-node-060zz                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-bs0b9                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-k2h0j                                                      1/2       CrashLoopBackOff   42         3h
kube-system   calico-node-lt4fd                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-sqk63                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-policy-controller-279105993-jkqjh                               1/1       Running            18         3h
kube-system   heapster-v1.3.0-124807402-7mj11                                        2/2       Running            0          3h
kube-system   kube-apiserver-ip-10-0-0-53.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-apiserver-ip-10-0-0-96.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-controller-manager-ip-10-0-0-53.ap-northeast-1.compute.internal   1/1       Running            0          3h
kube-system   kube-controller-manager-ip-10-0-0-96.ap-northeast-1.compute.internal   1/1       Running            0          3h
kube-system   kube-dns-782804071-5mb11                                               2/4       CrashLoopBackOff   110        3h
kube-system   kube-dns-782804071-xrgdl                                               2/4       CrashLoopBackOff   110        3h
kube-system   kube-dns-autoscaler-2813114833-vzk5x                                   1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-190.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-209.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-239.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-247.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-53.ap-northeast-1.compute.internal                1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-9.ap-northeast-1.compute.internal                 1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-96.ap-northeast-1.compute.internal                1/1       Running            0          3h
kube-system   kube-scheduler-ip-10-0-0-53.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-scheduler-ip-10-0-0-96.ap-northeast-1.compute.internal            1/1       Running            1          3h
kube-system   kubernetes-dashboard-v1.5.1-67s47                                      0/1       CrashLoopBackOff   47         3h
```

calico-node fails due to inability to communiate with etcd:

```
+ kubectl logs calico-node-060zz --namespace kube-system calico-node
time="2017-04-04T12:12:45Z" level=info msg="Creating Calico client"
time="2017-04-04T12:12:45Z" level=info msg="Loading config from environment"
time="2017-04-04T12:12:45Z" level=info msg="Ensuring datastore is initialized"
time="2017-04-04T12:13:15Z" level=info msg="Unhandled error: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout"
time="2017-04-04T12:13:15Z" level=warning msg="Failed to set ready flag" error="client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout"
panic: Error initializing datastore: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout

goroutine 1 [running]:
panic(0x12fc2e0, 0xc420364260)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
main.main()
	/go/src/github.com/projectcalico/calico-containers/calico_node/startup.go:47 +0x439
Calico node failed to start
```
  • Loading branch information
mumoshu committed Apr 4, 2017
1 parent aca83e3 commit d094c4b
Show file tree
Hide file tree
Showing 3 changed files with 198 additions and 41 deletions.
151 changes: 113 additions & 38 deletions core/controlplane/config/templates/cloud-config-controller
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,7 @@ coreos:
--volume stage,kind=host,source=/tmp \
--mount volume=stage,target=/tmp \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log{{ if .UseCalico }} \
--volume cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=cni-bin,target=/opt/cni/bin{{ end }}"
--mount volume=var-log,target=/var/log"
ExecStartPre=/usr/bin/systemctl is-active flanneld.service
ExecStartPre=/usr/bin/systemctl is-active cfn-etcd-environment.service
ExecStartPre=/usr/bin/mkdir -p /var/lib/cni
Expand All @@ -159,6 +157,9 @@ coreos:
cluster-health

ExecStartPre=/bin/sh -ec "find /etc/kubernetes/manifests /srv/kubernetes/manifests -maxdepth 1 -type f | xargs --no-run-if-empty sed -i 's|#ETCD_ENDPOINTS#|${ETCD_ENDPOINTS}|'"
{{if .UseCalico -}}
ExecStartPre=/bin/sh -ec "mkdir -p /etc/calico/certs && cp /etc/kubernetes/ssl/ca.pem /etc/calico/certs/etcd-ca && cp /etc/kubernetes/ssl/etcd-client-key.pem /etc/calico/certs/etcd-key && cp /etc/kubernetes/ssl/etcd-client.pem /etc/calico/certs/etcd-cert"
{{end -}}
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--api-servers=http://localhost:8080 \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
Expand Down Expand Up @@ -624,10 +625,14 @@ write_files:
k8s-app: calico-node
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: |
[{"key": "node.alpha.kubernetes.io/role", "value": "master", "effect": "NoSchedule" },
{"key":"CriticalAddonsOnly", "operator":"Exists"}]
spec:
tolerations:
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
- key: "CriticalAddonsOnly"
operator: "Exists"
hostNetwork: true
containers:
- name: calico-node
Expand Down Expand Up @@ -673,30 +678,6 @@ write_files:
- mountPath: /etc/resolv.conf
name: dns
readOnly: true
- name: install-cni
image: {{ .CalicoCniImage.RepoWithTag }}
imagePullPolicy: Always
command: ["/install-cni.sh"]
env:
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
- name: CNI_NET_DIR
value: "/etc/kubernetes/cni/net.d"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /calico-secrets
name: etcd-certs
volumes:
- name: lib-modules
hostPath:
Expand Down Expand Up @@ -728,9 +709,6 @@ write_files:
k8s-app: calico-policy
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: |
[{"key": "node.alpha.kubernetes.io/role", "value": "master", "effect": "NoSchedule" },
{"key":"CriticalAddonsOnly", "operator":"Exists"}]

spec:
replicas: 1
Expand All @@ -741,6 +719,13 @@ write_files:
labels:
k8s-app: calico-policy
spec:
tolerations:
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
- key: "CriticalAddonsOnly"
operator: "Exists"
hostNetwork: true
containers:
- name: calico-policy-controller
Expand Down Expand Up @@ -1258,8 +1243,10 @@ write_files:
k8s-app: kube-rescheduler
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
hostNetwork: true
containers:
- name: kube-rescheduler
Expand Down Expand Up @@ -1287,8 +1274,10 @@ write_files:
k8s-app: kube-dns-autoscaler
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: autoscaler
image: {{ .ClusterAutoscalerImage.RepoWithTag }}
Expand Down Expand Up @@ -1334,8 +1323,10 @@ write_files:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: kubedns
image: {{ .KubeDnsImage.RepoWithTag }}
Expand Down Expand Up @@ -1497,8 +1488,10 @@ write_files:
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- image: {{ .HeapsterImage.RepoWithTag }}
name: heapster
Expand Down Expand Up @@ -1590,8 +1583,10 @@ write_files:
kubernetes.io/cluster-service: "true"
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: kubernetes-dashboard
image: {{ .KubeDashboardImage.RepoWithTag }}
Expand Down Expand Up @@ -1676,6 +1671,86 @@ write_files:

{{ else }}

- path: /etc/kubernetes/manifests/calico-cni.yaml
content: |
apiVersion: v1
kind: Pod
metadata:
name: calico-cni
namespace: kube-system
labels:
k8s-app: calico-cni
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
- key: "CriticalAddonsOnly"
operator: "Exists"
hostNetwork: true
containers:
- name: install-cni
image: {{ .CalicoCniImage.RepoWithTag }}
imagePullPolicy: Always
command: ["/install-cni.sh"]
env:
- name: ETCD_ENDPOINTS
value: "#ETCD_ENDPOINTS#"
- name: CNI_NETWORK_CONFIG
value: |-
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "__ETCD_ENDPOINTS__",
"etcd_key_file": "__ETCD_KEY_FILE__",
"etcd_cert_file": "__ETCD_CERT_FILE__",
"etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
"log_level": "info",
"policy": {
"type": "k8s",
"k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
"k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
}
}
- name: CNI_NET_DIR
value: "/etc/kubernetes/cni/net.d"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /calico-secrets
name: etcd-certs
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/kubernetes/cni/net.d
# a mirror pod can't mount from a k8s secret. So copy required files beforehand and mount the host path
- name: etcd-certs
hostPath:
path: /etc/calico/certs
- name: dns
hostPath:
path: /etc/resolv.conf

# http://docs.projectcalico.org/v2.0/usage/configuration/
- path: /etc/modules-load.d/nf.conf
content: |
Expand Down
87 changes: 84 additions & 3 deletions core/controlplane/config/templates/cloud-config-worker
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,7 @@ coreos:
--volume stage,kind=host,source=/tmp \
--mount volume=stage,target=/tmp \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log{{ if .UseCalico }} \
--volume cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=cni-bin,target=/opt/cni/bin{{ end }}"
--mount volume=var-log,target=/var/log"
ExecStartPre=/usr/bin/systemctl is-active flanneld.service
ExecStartPre=/usr/bin/systemctl is-active cfn-etcd-environment.service
ExecStartPre=/usr/bin/mkdir -p /var/lib/cni
Expand All @@ -154,6 +152,9 @@ coreos:
--cert-file /etc/kubernetes/ssl/etcd-client.pem \
--endpoints "${ETCD_ENDPOINTS}" \
cluster-health
{{if .UseCalico -}}
ExecStartPre=/bin/sh -ec "mkdir -p /etc/calico/certs && cp /etc/kubernetes/ssl/ca.pem /etc/calico/certs/etcd-ca && cp /etc/kubernetes/ssl/etcd-client-key.pem /etc/calico/certs/etcd-key && cp /etc/kubernetes/ssl/etcd-client.pem /etc/calico/certs/etcd-cert"
{{end -}}
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--api-servers={{.APIServerEndpoint}} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
Expand Down Expand Up @@ -807,6 +808,86 @@ write_files:

{{ else }}

- path: /etc/kubernetes/manifests/calico-cni.yaml
content: |
apiVersion: v1
kind: Pod
metadata:
name: calico-cni
namespace: kube-system
labels:
k8s-app: calico-cni
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
- key: "CriticalAddonsOnly"
operator: "Exists"
hostNetwork: true
containers:
- name: install-cni
image: {{ .CalicoCniImage.RepoWithTag }}
imagePullPolicy: Always
command: ["/install-cni.sh"]
env:
- name: ETCD_ENDPOINTS
value: "#ETCD_ENDPOINTS#"
- name: CNI_NETWORK_CONFIG
value: |-
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "__ETCD_ENDPOINTS__",
"etcd_key_file": "__ETCD_KEY_FILE__",
"etcd_cert_file": "__ETCD_CERT_FILE__",
"etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
"log_level": "info",
"policy": {
"type": "k8s",
"k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
"k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
}
}
- name: CNI_NET_DIR
value: "/etc/kubernetes/cni/net.d"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /calico-secrets
name: etcd-certs
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/kubernetes/cni/net.d
# a mirror pod can't mount from a k8s secret. So copy required files beforehand and mount the host path
- name: etcd-certs
hostPath:
path: /etc/calico/certs
- name: dns
hostPath:
path: /etc/resolv.conf

# http://docs.projectcalico.org/v2.0/usage/configuration/
- path: /etc/modules-load.d/nf.conf
content: |
Expand Down
1 change: 1 addition & 0 deletions core/nodepool/config/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ func (c DeploymentSettings) WithDefaultsFrom(main cfg.DeploymentSettings) Deploy
c.HyperkubeImage.Tag = c.K8sVer
c.AWSCliImage.MergeIfEmpty(main.AWSCliImage)
c.CalicoCtlImage.MergeIfEmpty(main.CalicoCtlImage)
c.CalicoCniImage.MergeIfEmpty(main.CalicoCniImage)
c.PauseImage.MergeIfEmpty(main.PauseImage)
c.FlannelImage.MergeIfEmpty(main.FlannelImage)

Expand Down

0 comments on commit d094c4b

Please sign in to comment.