Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script to support launch k8s v1.12 #195

Merged
merged 5 commits into from
Nov 28, 2018

Conversation

onlymellb
Copy link
Contributor

@onlymellb onlymellb commented Nov 23, 2018

This PR updates script dind-cluster-v1.10.sh to make it consistent with the upstream and adds a script dind-cluster-v1.12.sh to launch k8s v.1.12. We will migrate the CI environment to k8s v1.12 later. resolve #183

@onlymellb
Copy link
Contributor Author

/run-e2e-tests

@gregwebs
Copy link
Contributor

I tested this out. First run ./manifests/local-dind/dind-cluster-v1.10.sh clean if you have used DinD before.

When I run up, I see this error. Here is the output when running it a second time:

./manifests/local-dind/dind-cluster-v1.12.sh up
WARNING: No swap limit support
WARNING: No swap limit support
WARNING: No swap limit support
WARNING: No swap limit support
* Making sure DIND image is up to date 
v1.12: Pulling from mirantis/kubeadm-dind-cluster
Digest: sha256:308180b08091d6b19e52ecff0d22a3334df287322b7091b9e037930b294e2d29
Status: Image is up to date for mirantis/kubeadm-dind-cluster:v1.12
* Removing container: 916e1a1e65ff
916e1a1e65ff
* Starting DIND container: kube-master
* Running kubeadm: init --config /etc/kubeadm.conf --ignore-preflight-errors=all
Initializing machine ID from random generator.
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
docker failed to start. Diagnostics below:
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2018-11-23 20:15:58 UTC; 13ms ago
     Docs: https://docs.docker.com
  Process: 99 ExecStart=/usr/local/bin/rundocker (code=exited, status=1/FAILURE)
 Main PID: 99 (code=exited, status=1/FAILURE)
      CPU: 58ms

Nov 23 20:15:58 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 23 20:15:58 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

I do see the master running with docker ps:

916e1a1e65ff        mirantis/kubeadm-dind-cluster:v1.12   "/sbin/dind_init sys…"   28 seconds ago      Up 26 seconds       127.0.0.1:8080->8080/tcp, 127.0.0.1:5000->5001/tcp                                                                                                             kube-master

@onlymellb
Copy link
Contributor Author

@gregwebs I haven't encountered this problem in my own test, can you see the log of the docker startup failure? docker exec -ti kube-master bash; journalctl -a -u docker

@gregwebs
Copy link
Contributor

The complaint is: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable

I googled that error and didn't come up with anything, any ideas?

root@kube-master:/#  journalctl -a -u docker                                                                                                                                                                      
WARNING: terminal is not fully functional
-- Logs begin at Mon 2018-11-26 17:19:45 UTC, end at Mon 2018-11-26 19:04:14 UTC. --
Nov 26 17:19:47 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:47 kube-master rundocker[96]: Trying to load overlay module (this may fail)
Nov 26 17:19:47 kube-master rundocker[96]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:47 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:47 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:49 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:49 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:49 kube-master rundocker[147]: Trying to load overlay module (this may fail)
Nov 26 17:19:49 kube-master rundocker[147]: /dev/nvme0n1p3 /var/lib/kubelet/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:49 kube-master rundocker[147]: /dev/nvme0n1p3 /var/log/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:49 kube-master rundocker[147]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:49 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:49 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:51 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:51 kube-master systemd[1]: Starting Docker Application Container Engine...
Nov 26 17:19:51 kube-master rundocker[158]: Trying to load overlay module (this may fail)
Nov 26 17:19:51 kube-master rundocker[158]: /dev/nvme0n1p3 /var/lib/kubelet/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:51 kube-master rundocker[158]: /dev/nvme0n1p3 /var/log/pods ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
Nov 26 17:19:51 kube-master rundocker[158]: /usr/local/bin/rundocker: line 94: DIND_CRI: unbound variable
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 17:19:51 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:51 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 26 17:19:54 kube-master systemd[1]: Stopped Docker Application Container Engine.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Start request repeated too quickly.
Nov 26 17:19:54 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Unit entered failed state.
Nov 26 17:19:54 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

@gregwebs
Copy link
Contributor

BTW, I am running Linux.

luolibin added 2 commits November 27, 2018 15:50
@onlymellb
Copy link
Contributor Author

@gregwebs The reason for this problem is that the upstream image is updated. I have updated the startup script and fixed the image to the specific version. You can try again.

@tennix
Copy link
Member

tennix commented Nov 27, 2018

I'm also having error running DinD v1.12 cluster. I'm using NixOS. The error complains Docker can't be started because of dependency failed to start. After some diagnosing, I found that the new version runs containerd service as a dependency of docker. The containerd systemd service file is as follows:

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
KillMode=process
Delegate=yes
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity

[Install]
WantedBy=multi-user.target

It failed to start because the pre-start command /sbin/modprobe overlay failed. The overlay kernel module was actually already loaded but somehow failed when running the command. I've also noticed that docker daemon is started by a custom script. The script also requires loading overlay module but ignores when loading fails:
https://github.com/kubernetes-sigs/kubeadm-dind-cluster/blob/cd87c3dd6608bc565aa73103a4bb4634e4d01694/image/rundocker#L64-L65

According to this, I've injected the following command which comments out the ExecStartPre line after the node container is created.

docker exec ${container_id} sed -i '/ExecStartPre/s/^/#/g' /lib/systemd/system/containerd.service

After that, the DinD 1.12 cluster can be started correctly.

@tennix
Copy link
Member

tennix commented Nov 27, 2018

Also the systemd service provided by containerd project uses ExecStartPre=-/sbin/modprobe overlay which ignores when modprobe fails.

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gregwebs
Copy link
Contributor

It works now for me! I will test out deploying the operator & cluster tomorrow and let you know if I come across any issues.

@tennix
Copy link
Member

tennix commented Nov 28, 2018

Some configuration of the extended scheduler needs to be adjusted otherwise the operator may not function correctly. I think we should merge this and fix the scheduler configuration in a later PR.

@tennix tennix merged commit acf99fd into pingcap:master Nov 28, 2018
@@ -51,253 +52,457 @@ if [[ $(uname) == Linux && -z ${DOCKER_HOST:-} ]]; then
using_local_linuxdocker=1
fi

EMBEDDED_CONFIG=y;DIND_IMAGE=mirantis/kubeadm-dind-cluster:v1.10
EMBEDDED_CONFIG=y;DIND_IMAGE=mirantis/kubeadm-dind-cluster@sha256:f7c6b21a9a0a55c4bc79678d5b339dea02a6f3aaa3307c0c120c6a9b2cf0f4fc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old version which is incompatible with this new script. The pod network is broken.

@gregwebs
Copy link
Contributor

I verified I can bring up TiDB on master.

queenliuxx pushed a commit to queenliuxx/tidb-operator that referenced this pull request Dec 19, 2018
* add script to support launch k8s v1.12

* fix unbound variable DIND_CRI

* fix the problem that containerd failed to start
fgksgf pushed a commit to fgksgf/tidb-operator that referenced this pull request Dec 23, 2024
Signed-off-by: liubo02 <liubo02@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrading K8s to v1.12
4 participants