Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Bump to Kubernetes v1.6.1 #492

Merged
merged 18 commits into from
Apr 6, 2017
Merged

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented Apr 4, 2017

As etcd3 support is already introduced via #417, it was (ideally) a matter of running bump-version and then run E2E tests against a newly created kube-aws cluster with k8s 1.6.1.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 4, 2017
@codecov-io
Copy link

codecov-io commented Apr 4, 2017

Codecov Report

Merging #492 into master will increase coverage by 0.02%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #492      +/-   ##
==========================================
+ Coverage    38.1%   38.12%   +0.02%     
==========================================
  Files          44       44              
  Lines        3110     3108       -2     
==========================================
  Hits         1185     1185              
  Misses       1729     1729              
+ Partials      196      194       -2
Impacted Files Coverage Δ
model/etcd.go 0% <0%> (ø) ⬆️
core/controlplane/config/config.go 55.47% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16168b7...3cff7f7. Read the comment docs.

@mumoshu
Copy link
Contributor Author

mumoshu commented Apr 4, 2017

Self-hosted Calico + cni is not working. See #494

@mumoshu
Copy link
Contributor Author

mumoshu commented Apr 4, 2017

Even flannel + cni is not working #495

mumoshu added 2 commits April 4, 2017 17:29
This is just the result of running the following:

```
$ contrib/bump-version v1.6.1_coreos.0
Updating contrib/bump-version
Updating core/controlplane/config/config.go
Updating core/controlplane/config/templates/cluster.yaml
Updating e2e/kubernetes/Dockerfile
Updating e2e/kubernetes/Makefile
Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md
$ git checkout -p -- vendor
```

As etcd3 support is already introduced via kubernetes-retired#417, after this change is introduced, it is (ideally) a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1.
Also fix the issue that we end up with the incomplete etcd version number when `etcd.version` is set to just `3` but not e.g. `3.1.5`
Closes kubernetes-retired#495

E2E passed after this change:

```
$ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all
*snip*
Ran 151 of 588 Specs in 3492.050 seconds
SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS

Ginkgo ran 1 suite in 58m12.359210255s
Test Suite Passed
2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s
2017/04/04 09:35:29 e2e.go:80: Done
```
Closes kubernetes-retired#494

Note: without changing the way we run `install-cni` containers of `calico-node` pods, E2E fails due to some pods stall in CrashLoopback:

```
+ kubectl get po --all-namespaces
NAMESPACE     NAME                                                                   READY     STATUS             RESTARTS   AGE
kube-system   calico-node-060zz                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-bs0b9                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-k2h0j                                                      1/2       CrashLoopBackOff   42         3h
kube-system   calico-node-lt4fd                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-node-sqk63                                                      1/2       CrashLoopBackOff   43         3h
kube-system   calico-policy-controller-279105993-jkqjh                               1/1       Running            18         3h
kube-system   heapster-v1.3.0-124807402-7mj11                                        2/2       Running            0          3h
kube-system   kube-apiserver-ip-10-0-0-53.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-apiserver-ip-10-0-0-96.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-controller-manager-ip-10-0-0-53.ap-northeast-1.compute.internal   1/1       Running            0          3h
kube-system   kube-controller-manager-ip-10-0-0-96.ap-northeast-1.compute.internal   1/1       Running            0          3h
kube-system   kube-dns-782804071-5mb11                                               2/4       CrashLoopBackOff   110        3h
kube-system   kube-dns-782804071-xrgdl                                               2/4       CrashLoopBackOff   110        3h
kube-system   kube-dns-autoscaler-2813114833-vzk5x                                   1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-190.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-209.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-239.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-247.ap-northeast-1.compute.internal               1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-53.ap-northeast-1.compute.internal                1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-9.ap-northeast-1.compute.internal                 1/1       Running            0          3h
kube-system   kube-proxy-ip-10-0-0-96.ap-northeast-1.compute.internal                1/1       Running            0          3h
kube-system   kube-scheduler-ip-10-0-0-53.ap-northeast-1.compute.internal            1/1       Running            0          3h
kube-system   kube-scheduler-ip-10-0-0-96.ap-northeast-1.compute.internal            1/1       Running            1          3h
kube-system   kubernetes-dashboard-v1.5.1-67s47                                      0/1       CrashLoopBackOff   47         3h
```

calico-node fails due to inability to communiate with etcd:

```
+ kubectl logs calico-node-060zz --namespace kube-system calico-node
time="2017-04-04T12:12:45Z" level=info msg="Creating Calico client"
time="2017-04-04T12:12:45Z" level=info msg="Loading config from environment"
time="2017-04-04T12:12:45Z" level=info msg="Ensuring datastore is initialized"
time="2017-04-04T12:13:15Z" level=info msg="Unhandled error: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout"
time="2017-04-04T12:13:15Z" level=warning msg="Failed to set ready flag" error="client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout"
panic: Error initializing datastore: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout

goroutine 1 [running]:
panic(0x12fc2e0, 0xc420364260)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
main.main()
	/go/src/github.com/projectcalico/calico-containers/calico_node/startup.go:47 +0x439
Calico node failed to start
```
mumoshu added 8 commits April 5, 2017 09:43
```
  13s		13s		1	kubelet, ip-10-0-0-224.ap-northeast-1.compute.internal			Warning		FailedSync	Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-6z186_kube-system\" network: open /etc/kubernetes/cni/net.d/calico-tls/etcd-cert: no such file or directory"
```
This fixes the error below:

```
  13s		12s		2	kubelet, ip-10-0-0-16.ap-northeast-1.compute.internal			Warning		FailedSync	Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-t46zn_kube-system(37c1a478-19fd-11e7-a57c-061cc33b51ff)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-t46zn_kube-system(37c1a478-19fd-11e7-a57c-061cc33b51ff)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-t46zn_kube-system\" network: Get https://kubeaw6.cwtest.info/api/v1/namespaces/kube-system/pods/kube-dns-3816048056-t46zn: x509: certificate signed by unknown authority"
```
…o-node pod is up

This is intended to fix the following error which keeps occuring when we don't manually recreate kube-dns pod:

```
  4m	4s	21	kubelet, ip-10-0-0-238.ap-northeast-1.compute.internal		Warning	FailedSync	Error syncing pod, skipping: failed to "KillPodSandbox" for "db8941be-1a02-11e7-bc1e-06966d4cb749" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-3816048056-mzbsj_kube-system\" network: open /var/lib/cni/flannel/47aca6c4a931562d5b8452ddecbf054721cacf99e25232033db4aacc735baa4c: no such file or directory"
```
@camilb
Copy link
Contributor

camilb commented Apr 5, 2017

Hi @mumoshu I think these changes should also be made to create the minimal roles when RBAC is enabled.
Edit: already created the PR #504

diff --git a/core/controlplane/config/templates/cloud-config-controller b/core/controlplane/config/templates/cloud-config-controller
index 1c57c4d..f2c3a20 100644
--- a/core/controlplane/config/templates/cloud-config-controller
+++ b/core/controlplane/config/templates/cloud-config-controller
@@ -491,7 +491,7 @@ write_files:
       #!/bin/bash -e
 
       kubectl() {
-          /usr/bin/docker run --rm --net=host -v /srv/kubernetes/manifests:/srv/kubernetes/manifests {{.HyperkubeImage.RepoWithTag}} /hyperkube kubectl "$@"
+          /usr/bin/docker run --rm --net=host -v /srv/kubernetes:/srv/kubernetes {{.HyperkubeImage.RepoWithTag}} /hyperkube kubectl "$@"
       }
 
       mfdir=/srv/kubernetes/manifests
@@ -845,7 +845,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-roles/cluster-admin.yaml
     content: |
         kind: ClusterRole
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
             name: cluster-admin
         rules:
@@ -860,7 +860,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-roles/bootstrapped-node.yaml
     content: |
         kind: ClusterRole
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
             name: bootstrapped-node
         rules:
@@ -892,7 +892,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-role-bindings/kube-admin.yaml
     content: |
         kind: ClusterRoleBinding
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: kube-admin
         subjects:
@@ -906,7 +906,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-role-bindings/kube-worker.yaml
     content: |
         kind: ClusterRoleBinding
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: kube-worker
         subjects:
@@ -921,7 +921,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-role-bindings/system-worker.yaml
     content: |
         kind: ClusterRoleBinding
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: system-worker
         subjects:
@@ -938,7 +938,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-role-bindings/bootstrapped-node.yaml
     content: |
         kind: ClusterRoleBinding
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: bootstrapped-node
         subjects:
@@ -954,7 +954,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-roles/kubelet-bootstrap.yaml
     content: |
         kind: ClusterRole
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: kubelet-bootstrap
         rules:
@@ -971,7 +971,7 @@ write_files:
   - path: /srv/kubernetes/rbac/cluster-role-bindings/kubelet-bootstrap.yaml
     content: |
         kind: ClusterRoleBinding
-        apiVersion: rbac.authorization.k8s.io/v1alpha1
+        apiVersion: rbac.authorization.k8s.io/v1beta1
         metadata:
           name: kubelet-bootstrap
         subjects:
@@ -1076,7 +1076,7 @@ write_files:
           - --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
           - --client-ca-file=/etc/kubernetes/ssl/ca.pem
           - --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
-          - --runtime-config=extensions/v1beta1/networkpolicies=true,batch/v2alpha1{{if .Experimental.Plugins.Rbac.Enabled}},rbac.authorization.k8s.io/v1alpha1=true{{ end }}{{if .Experimental.Admission.PodSecurityPolicy.Enabled}},extensions/v1beta1/podsecuritypolicy=true{{ end }}
+          - --runtime-config=extensions/v1beta1/networkpolicies=true,batch/v2alpha1{{if .Experimental.Plugins.Rbac.Enabled}},rbac.authorization.k8s.io/v1beta1=true{{ end }}{{if .Experimental.Admission.PodSecurityPolicy.Enabled}},extensions/v1beta1/podsecuritypolicy=true{{ end }}
           - --cloud-provider=aws
           livenessProbe:
             httpGet:
@@ -1184,7 +1184,7 @@ write_files:
         hostNetwork: true
         volumes:
         {{if .Experimental.DisableSecurityGroupIngress}}
-        - hostPath: 
+        - hostPath:
             path: /etc/kubernetes/additional-configs
           name: additional-configs
         {{end}}

model/etcd.go Outdated
func (e Etcd) Version() EtcdVersion {
if e.Cluster.Version == "3" {
return "3.1.5"
}
if e.Cluster.Version != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why first check for "3"?

  if e.Cluster.Version != "" {
    return e.Cluster.Version
  }
  return "3.1.5"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is to translate:

etcd:
  version: 3

to

etcd:
  version: 3.1.5

for users who doesn't care much about minor/patch versions.
However as kube-aws starts defaulting to etcd3 from now on, we can just default to "3.1.5" when it is omitted, as you've suggested 👍

…i v0.5.1

which includes "plugins/meta/flannel: If net config is missing do not return err on DEL" to actually fix the error:

 ```
      13s           13s             1       kubelet, ip-10-0-0-224.ap-northeast-1.compute.internal                  Warning         FailedSync      Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-6z186_kube-system\" network: open /etc/kubernetes/cni/net.d/calico-tls/etcd-cert: no such file or directory"
 ```

https://github.com/projectcalico/cni-plugin/releases/tag/v1.6.2
https://github.com/containernetworking/cni/releases/tag/v0.5.1
mumoshu added 5 commits April 6, 2017 09:04
```
Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]"
```
@mumoshu
Copy link
Contributor Author

mumoshu commented Apr 6, 2017

E2E is passing against a cluster with/without Calico.

@mumoshu mumoshu merged commit b0614a8 into kubernetes-retired:master Apr 6, 2017
@mumoshu mumoshu changed the title WIP: Bump to Kubernetes v1.6.1 Bump to Kubernetes v1.6.1 Apr 6, 2017
camilb added a commit to camilb/kube-aws that referenced this pull request Apr 18, 2017
* kubernetes-incubator/master:
  'Cluster-dump' feature to export Kubernetes Resources to S3
  Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml.
  Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment)
  Wait until kube-system becomes ready Resolves kubernetes-retired#467
  Make the validation error message when KMS failed more friendly Now, when e.g. AWS_ACCESS_KEY_ID is missing, the error looks like: ``` $ kube-aws validate --s3-uri ... *snip* Error: Failed to initialize cluster driver: failed to read/create TLS assets: UnrecognizedClientException: The security token included in the request is invalid. ```
  Fix a race between systemd services: cfn-etc-environment and etcdadm-reconfigure
  Fix API endpoint from HA controllers
  Make AMI fetching even more reliable Resolves kubernetes-retired#474
  etcd unit should unconditionally depend on cfn-etcd-environment
  Rescheduler to 0.3.0 which uses k8s 1.6
  WIP: Bump to Kubernetes v1.6.1 (kubernetes-retired#492)
  Improve auth tokens / TLS bootstrapping UX (kubernetes-retired#489)
  Fix RBAC in Kubernetes 1.6. Fix etcdadm when terminated instances still exist.
  Retry userdata download
  Perform docker post-start check
  Bump to calico 2.1.1
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
* Bump to Kubernetes v1.6.1

This change was just the result of running the following commands:

```
$ contrib/bump-version v1.6.1_coreos.0
Updating contrib/bump-version
Updating core/controlplane/config/config.go
Updating core/controlplane/config/templates/cluster.yaml
Updating e2e/kubernetes/Dockerfile
Updating e2e/kubernetes/Makefile
Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md
$ git checkout -p -- vendor
```

As etcd3 support is already introduced via kubernetes-retired#417, after this change is introduced, it was ideally a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1, which turned out not to be true, hence the subsequent changes.

* Use etcd3 by default

etcd2 support will be dropped soon, as the etcd3 storage driver is already the default since k8s v.1.6.0.

* Bump to calico-cni v1.6.2, which is an even newer release than the one included in the latest calico v2.1.2, to deal with kubernetes/kubernetes#43488

* Set up /etc/kubernetes/cni/net.d not using calico-cni but by our own to deal with kubernetes/kubernetes#43014

* Set up /opt/cni/bin using docker rather than a k8s static pod to prevent temporary "failed to find plugin * in path" errors from cni

They were emitted when pods are scheduled but /opt/cni/bin is not yet populated

```
Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]"
```

* Fix a bug that resulted etcd-member.service to use the default version number 3.0.x regardless of what is specified via `etcd.version` in cluster.yaml. The bug was reported in kubernetes-retired#497 (comment)

* Simplify EtcdVersion func

According to the review comment kubernetes-retired#492 (review)

* Fix permanent errors like "failed to find plugin * in path" from cni which was breaking cni + flannel/calico in k8s 1.6, by specifying the `--cni-bin-dir=/opt/cni/bin` flag for kubelets

The default dir had been accidentally changed at least in k8s 1.6.0 and 1.6.1.

Resolves kubernetes-retired#494
Resolves kubernetes-retired#495

E2E against a cluster with flannel passed after this change:

```
$ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all
*snip*
Ran 151 of 588 Specs in 3492.050 seconds
SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS

Ginkgo ran 1 suite in 58m12.359210255s
Test Suite Passed
2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s
2017/04/04 09:35:29 e2e.go:80: Done
```

Also passed against a cluster with calico:

```
Ran 151 of 588 Specs in 3381.108 seconds
SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS

Ginkgo ran 1 suite in 56m21.415087252s
Test Suite Passed
2017/04/06 03:58:20 util.go:131: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 56m21.76726736s
2017/04/06 03:58:20 e2e.go:80: Done
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants