add script to support launch k8s v1.12 #195

onlymellb · 2018-11-23T08:17:29Z

This PR updates script dind-cluster-v1.10.sh to make it consistent with the upstream and adds a script dind-cluster-v1.12.sh to launch k8s v.1.12. We will migrate the CI environment to k8s v1.12 later. resolve #183

onlymellb · 2018-11-23T11:20:28Z

/run-e2e-tests

gregwebs · 2018-11-23T20:19:44Z

I tested this out. First run ./manifests/local-dind/dind-cluster-v1.10.sh clean if you have used DinD before.

When I run up, I see this error. Here is the output when running it a second time:

./manifests/local-dind/dind-cluster-v1.12.sh up
WARNING: No swap limit support
WARNING: No swap limit support
WARNING: No swap limit support
WARNING: No swap limit support
* Making sure DIND image is up to date 
v1.12: Pulling from mirantis/kubeadm-dind-cluster
Digest: sha256:308180b08091d6b19e52ecff0d22a3334df287322b7091b9e037930b294e2d29
Status: Image is up to date for mirantis/kubeadm-dind-cluster:v1.12
* Removing container: 916e1a1e65ff
916e1a1e65ff
* Starting DIND container: kube-master
* Running kubeadm: init --config /etc/kubeadm.conf --ignore-preflight-errors=all
Initializing machine ID from random generator.
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
docker failed to start. Diagnostics below:
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2018-11-23 20:15:58 UTC; 13ms ago
     Docs: https://docs.docker.com
  Process: 99 ExecStart=/usr/local/bin/rundocker (code=exited, status=1/FAILURE)
 Main PID: 99 (code=exited, status=1/FAILURE)
      CPU: 58ms

Nov 23 20:15:58 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 23 20:15:58 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

I do see the master running with docker ps:

916e1a1e65ff        mirantis/kubeadm-dind-cluster:v1.12   "/sbin/dind_init sys…"   28 seconds ago      Up 26 seconds       127.0.0.1:8080->8080/tcp, 127.0.0.1:5000->5001/tcp                                                                                                             kube-master

onlymellb · 2018-11-26T05:40:06Z

@gregwebs I haven't encountered this problem in my own test, can you see the log of the docker startup failure? docker exec -ti kube-master bash; journalctl -a -u docker

gregwebs · 2018-11-26T19:07:49Z

The complaint is: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable

I googled that error and didn't come up with anything, any ideas?

root@kube-master:/#  journalctl -a -u docker                                                                                                                                                                      
WARNING: terminal is not fully functional
-- Logs begin at Mon 2018-11-26 17:19:45 UTC, end at Mon 2018-11-26 19:04:14 UTC. --
Nov 26 17:19:47 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:47 kube-master rundocker[96]: Trying to load overlay module (this may fail)
Nov 26 17:19:47 kube-master rundocker[96]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:47 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:49 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:49 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:49 kube-master rundocker[147]: Trying to load overlay module (this may fail)
Nov 26 17:19:49 kube-master rundocker[147]: /dev/nvme0n1p3 /var/lib/kubelet/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:49 kube-master rundocker[147]: /dev/nvme0n1p3 /var/log/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:49 kube-master rundocker[147]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:49 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:51 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:51 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:51 kube-master rundocker[158]: Trying to load overlay module (this may fail)
Nov 26 17:19:51 kube-master rundocker[158]: /dev/nvme0n1p3 /var/lib/kubelet/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:51 kube-master rundocker[158]: /dev/nvme0n1p3 /var/log/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:51 kube-master rundocker[158]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:51 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:54 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Start request repeated too quickly.
Nov 26 17:19:54 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

gregwebs · 2018-11-26T20:39:32Z

BTW, I am running Linux.

…lb/tidb-operator into onlymellb/dind-support-k8s-v1.12

onlymellb · 2018-11-27T07:55:02Z

@gregwebs The reason for this problem is that the upstream image is updated. I have updated the startup script and fixed the image to the specific version. You can try again.

tennix · 2018-11-27T10:01:30Z

I'm also having error running DinD v1.12 cluster. I'm using NixOS. The error complains Docker can't be started because of dependency failed to start. After some diagnosing, I found that the new version runs containerd service as a dependency of docker. The containerd systemd service file is as follows:

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
KillMode=process
Delegate=yes
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity

[Install]
WantedBy=multi-user.target

It failed to start because the pre-start command /sbin/modprobe overlay failed. The overlay kernel module was actually already loaded but somehow failed when running the command. I've also noticed that docker daemon is started by a custom script. The script also requires loading overlay module but ignores when loading fails:
https://github.com/kubernetes-sigs/kubeadm-dind-cluster/blob/cd87c3dd6608bc565aa73103a4bb4634e4d01694/image/rundocker#L64-L65

According to this, I've injected the following command which comments out the ExecStartPre line after the node container is created.

docker exec ${container_id} sed -i '/ExecStartPre/s/^/#/g' /lib/systemd/system/containerd.service

After that, the DinD 1.12 cluster can be started correctly.

tennix · 2018-11-27T10:16:32Z

Also the systemd service provided by containerd project uses ExecStartPre=-/sbin/modprobe overlay which ignores when modprobe fails.

tennix

LGTM

gregwebs · 2018-11-28T01:44:28Z

It works now for me! I will test out deploying the operator & cluster tomorrow and let you know if I come across any issues.

tennix · 2018-11-28T02:32:07Z

Some configuration of the extended scheduler needs to be adjusted otherwise the operator may not function correctly. I think we should merge this and fix the scheduler configuration in a later PR.

tennix · 2018-11-28T13:24:14Z

manifests/local-dind/dind-cluster-v1.10.sh

@@ -51,253 +52,457 @@ if [[ $(uname) == Linux && -z ${DOCKER_HOST:-} ]]; then
    using_local_linuxdocker=1
 fi

-EMBEDDED_CONFIG=y;DIND_IMAGE=mirantis/kubeadm-dind-cluster:v1.10
+EMBEDDED_CONFIG=y;DIND_IMAGE=mirantis/kubeadm-dind-cluster@sha256:f7c6b21a9a0a55c4bc79678d5b339dea02a6f3aaa3307c0c120c6a9b2cf0f4fc


This is an old version which is incompatible with this new script. The pod network is broken.

gregwebs · 2018-11-28T22:54:30Z

I verified I can bring up TiDB on master.

* add script to support launch k8s v1.12 * fix unbound variable DIND_CRI * fix the problem that containerd failed to start

Signed-off-by: liubo02 <liubo02@pingcap.com>

add script to support launch k8s v1.12

9ab5d79

Merge branch 'master' into onlymellb/dind-support-k8s-v1.12

90f9954

luolibin added 2 commits November 27, 2018 15:50

fix unbound variable DIND_CRI

8d8b304

Merge branch 'onlymellb/dind-support-k8s-v1.12' of github.com:onlymel…

45f9a42

…lb/tidb-operator into onlymellb/dind-support-k8s-v1.12

fix the problem that containerd failed to start

393f920

tennix approved these changes Nov 27, 2018

View reviewed changes

weekface approved these changes Nov 28, 2018

View reviewed changes

tennix merged commit acf99fd into pingcap:master Nov 28, 2018

tennix reviewed Nov 28, 2018

View reviewed changes

queenliuxx pushed a commit to queenliuxx/tidb-operator that referenced this pull request Dec 19, 2018

add script to support launch k8s v1.12 (pingcap#195)

dcca03f

* add script to support launch k8s v1.12 * fix unbound variable DIND_CRI * fix the problem that containerd failed to start

fgksgf pushed a commit to fgksgf/tidb-operator that referenced this pull request Dec 23, 2024

fix(lint): fix whitespace lint (pingcap#195)

722fa9f

Signed-off-by: liubo02 <liubo02@pingcap.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add script to support launch k8s v1.12 #195

add script to support launch k8s v1.12 #195

onlymellb commented Nov 23, 2018 •

edited

Loading

onlymellb commented Nov 23, 2018

gregwebs commented Nov 23, 2018

onlymellb commented Nov 26, 2018

gregwebs commented Nov 26, 2018

gregwebs commented Nov 26, 2018

onlymellb commented Nov 27, 2018

tennix commented Nov 27, 2018 •

edited

Loading

tennix commented Nov 27, 2018

tennix left a comment

gregwebs commented Nov 28, 2018

tennix commented Nov 28, 2018

tennix Nov 28, 2018

gregwebs commented Nov 28, 2018

add script to support launch k8s v1.12 #195

add script to support launch k8s v1.12 #195

Conversation

onlymellb commented Nov 23, 2018 • edited Loading

onlymellb commented Nov 23, 2018

gregwebs commented Nov 23, 2018

onlymellb commented Nov 26, 2018

gregwebs commented Nov 26, 2018

gregwebs commented Nov 26, 2018

onlymellb commented Nov 27, 2018

tennix commented Nov 27, 2018 • edited Loading

tennix commented Nov 27, 2018

tennix left a comment

Choose a reason for hiding this comment

gregwebs commented Nov 28, 2018

tennix commented Nov 28, 2018

tennix Nov 28, 2018

Choose a reason for hiding this comment

gregwebs commented Nov 28, 2018

onlymellb commented Nov 23, 2018 •

edited

Loading

tennix commented Nov 27, 2018 •

edited

Loading