Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rootless Docker #1727

Closed
wants to merge 3 commits into from

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jul 13, 2020

This PR adds support for running kind with Rootless Docker provider.
Requires cgroup v2 hosts.

Commits in this PR

1. "podman: unlock rootless"

Turn off "podman provider does not work properly in rootless mode" error and print a warning message instead.
However, Podman provider still doesn't work . (see the bottom of this PR)

2. "base: ignore EACCES from mount -o remount,ro /sys"

mount -o remount,ro /sys fails with permission denied on rootless Docker and on rootless Podman, but the error is negligible.

3. "containerd: add /etc/containerd/config-rootless.toml"

/etc/containerd/config-rootless.toml is the config for running kind in rootless Docker/Podman.

  • ociwrapper script is used to remove .linux.resources.devices from config.json, because .linux.resources.devices is meaningless on rootless and yet produces errors. Workaround until we get proper fixes in containerd and runc.
  • restrict_oom_score_adj is set to true to ignore oom_score_adj errors

The entrypoint overrides /etc/containerd/config.toml with config-rootless.toml when running in rootless Docker/Podman.
The rootless-ness is detected by comparing /proc/1/uid_map with 0 0 4294967295.

How to test

Images

Base

$ docker build -t kind-base ./images/base

Available on Docker Hub as akihirosuda/tmp-kind-base:g554d2e07.
Built from https://github.com/AkihiroSuda/kind/commits/554d2e076b1ea0fb55fcdff5cf8d972933bb78df .

Node

Needs PR kubernetes/kubernetes#93012 and PR kubernetes/kubernetes#92863.
The kubelet: new feature gate: Rootless commit is not necessary for kind, because Rootless Docker itself sets up cgroup fs.

$ kind build node-image --base-image kind-base

Available on Docker Hub as akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
Built from https://github.com/AkihiroSuda/kind/commits/554d2e076b1ea0fb55fcdff5cf8d972933bb78df + https://github.com/AkihiroSuda/kubernetes/commits/3c1dda52bb3a931acb4810e34fbfa1afee949ec5

NOTE

Rootful Docker is still required for kind build node-image.

Rootless Docker

  • Boot Ubuntu 20.04 host with systemd.unified_cgroup_hierarchy=1
  • Install Moby from its master branch (Docker 20.0X). Binaries are available at https://github.com/AkihiroSuda/moby-snapshot
  • Run dockerd-rootless.sh
  • export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock, and make sure docker info shows "rootless" as a security option.
  • Run kind create cluster --image akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
  • Run ps auxw on the hosts, and make sure the kind processes are running as unprivileged users
  • Make sure kubectl get pods -A shows all pods as Running
$ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock

$ docker info
...
Cgroup Driver: systemd
 Cgroup Version: 2
...
 Security Options:
  seccomp
   Profile: default
  rootless
  cgroupns
...

$ kind create cluster --image akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
Creating cluster "kind" ...
 ✓ Ensuring node image (akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:     
                                                  
kubectl cluster-info --context kind-kind
                                                  
Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
kube-system          coredns-f9fd979d6-r2ts9                      1/1     Running   0          5m23s
kube-system          coredns-f9fd979d6-t8nnc                      1/1     Running   0          5m23s
kube-system          etcd-kind-control-plane                      1/1     Running   0          5m28s
kube-system          kindnet-pxnxs                                1/1     Running   0          5m23s
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          5m28s
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          5m28s
kube-system          kube-proxy-v9txg                             1/1     Running   0          5m23s
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          5m28s
local-path-storage   local-path-provisioner-7994557747-ggnt4      1/1     Running   0          5m23s

Rootless Podman (doesn't work yet)

$ KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster --image akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Creating cluster "kind" ...
support for rootless mode is experimental, some features may not work
 ✓ Ensuring node image (akihirosuda/tmp-kind-node:g554d2e07-g3c1dda52) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✗ Starting control-plane 🕹️ 
ERROR: failed to create cluster: failed to init node with kubeadm: command "podman exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0713 12:29:24.380961      49 initconfiguration.go:200] loading configuration from "/kind/kubeadm.conf"
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration
I0713 12:29:24.391266      49 interface.go:400] Looking for default routes with IPv4 addresses
I0713 12:29:24.391279      49 interface.go:405] Default route transits interface "tap0"
I0713 12:29:24.391411      49 interface.go:208] Interface tap0 is up
I0713 12:29:24.391987      49 interface.go:256] Interface "tap0" has 2 addresses :[10.0.2.100/24 fe80::d07b:4cff:fe69:ba48/64].
I0713 12:29:24.392058      49 interface.go:223] Checking addr  10.0.2.100/24.
I0713 12:29:24.392063      49 interface.go:230] IP found 10.0.2.100
I0713 12:29:24.392068      49 interface.go:262] Found valid IPv4 address 10.0.2.100 for interface "tap0".
I0713 12:29:24.392071      49 interface.go:411] Found active IP 10.0.2.100 
hostport :6443: host '' must be a valid IP address or a valid RFC-1123 DNS subdomain
k8s.io/kubernetes/cmd/kubeadm/app/util.ParseHostPort
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/endpoint.go:113
k8s.io/kubernetes/cmd/kubeadm/app/util/config.SetClusterDynamicDefaults
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:157
k8s.io/kubernetes/cmd/kubeadm/app/util/config.SetInitDynamicDefaults
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:56
k8s.io/kubernetes/cmd/kubeadm/app/util/config.documentMapToInitConfiguration
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:305
k8s.io/kubernetes/cmd/kubeadm/app/util/config.BytesToInitConfiguration
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:235
k8s.io/kubernetes/cmd/kubeadm/app/util/config.LoadInitConfigurationFromFile
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:207
k8s.io/kubernetes/cmd/kubeadm/app/util/config.LoadOrDefaultInitConfiguration
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/config/initconfiguration.go:219
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newInitData
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:333
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func3
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:193
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).InitData
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:183
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:141
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:842
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:950
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:887
k8s.io/kubernetes/cmd/kubeadm/app.Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
Error: exec session exited with non-zero exit code 1: OCI runtime error

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
`mount -o remount,ro /sys` fails with `permission denied` on rootless
Docker and on rootless Podman, but the error is negligible.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
`/etc/containerd/config-rootless.toml` is the config for running kind in rootless Docker/Podman.

* `ociwrapper` script is used to remove `.linux.resources.devices` from `config.json`,
  because `.linux.resources.devices` is meaningless on rootless and yet produces errors.
  Workaround until we get proper fixes in containerd and runc.

* restrict_oom_score_adj is set to true to ignore oom_score_adj errors

The entrypoint overrides `/etc/containerd/config.toml` with `config-rootless.toml` when running in rootless Docker/Podman.
The rootless-ness is detected by comparing `/proc/1/uid_map` with `0 0 4294967295`.

Note that Kubernetes needs to be patched as well (see the PR description text)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 13, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AkihiroSuda
To complete the pull request process, please assign bentheelder
You can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @AkihiroSuda!

It looks like this is your first PR to kubernetes-sigs/kind 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kind has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 13, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @AkihiroSuda. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 13, 2020
@AkihiroSuda
Copy link
Member Author

cc @giuseppe

@giuseppe
Copy link
Member

great achievement.

I'll look at the issue with Podman

@aojea
Copy link
Contributor

aojea commented Jul 13, 2020

/ok-to-test
this is pretty cool, thanks

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 13, 2020
if os.Geteuid() != 0 {
p.logger.Errorf("podman provider does not work properly in rootless mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should stop failing until this works, actually, this was previous state but it was confusing for users

@k8s-ci-robot
Copy link
Contributor

@AkihiroSuda: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kind-e2e-kubernetes-1-18 554d2e0 link /test pull-kind-e2e-kubernetes-1-18

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@aojea
Copy link
Contributor

aojea commented Jul 13, 2020

great achievement.

I'll look at the issue with Podman

@giuseppe I think that podman is failing because is using the slirp4netns network

I0713 12:29:24.392068      49 interface.go:262] Found valid IPv4 address 10.0.2.100 for interface "tap0".
I0713 12:29:24.392071      49 interface.go:411] Found active IP 10.0.2.100 
hostport :6443: host '' must be a valid IP address or a valid RFC-1123 DNS subdomain

@@ -70,6 +70,7 @@ RUN echo "Ensuring scripts are executable ..." \
libseccomp2 pigz \
bash ca-certificates curl rsync \
nfs-common \
jq \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how big is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about 1MB including deps
https://packages.ubuntu.com/focal/jq

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, paying 1MB for rootless seems worthwhile :-)

@@ -38,7 +38,11 @@ fix_mount() {
# https://systemd.io/CONTAINER_INTERFACE/
# however, we need other things from `docker run --privileged` ...
# and this flag also happens to make /sys rw, amongst other things
#
# EACCES on rootless is negligible.
set +o errexit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already detect if we're in rootless or not below, instead detect that early on and save it, and switch on it here?
toggling errexit in scripts leads to bugs, it has unintuitive behavior.

@@ -196,6 +197,18 @@ func (c *buildContext) buildImage(dir string) error {
if err := createFile(cmder, containerdConfigPath, containerdConfig); err != nil {
return err
}
containerdRootlessConfig, err := getContainerdConfig(containerdConfigTemplateData{
Copy link
Member

@BenTheElder BenTheElder Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to do this without building a special node-image.
the entrypoint can rewrite this at runtime instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we edit TOML in the entrypoint? Is sed robust enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be, since the entrypoint is tied to the config, and at this point user patches have not yet been applied, so we know what the config looks like.

we can sed on default_runtime_name =.* right?

@giuseppe
Copy link
Member

@giuseppe I think that podman is failing because is using the slirp4netns network

yes, and I am not sure yet how to address it. Containers must be able to contact each other but at the same time be in different network namespaces

@BenTheElder BenTheElder added this to the v0.9.0 milestone Jul 14, 2020
if os.Geteuid() != 0 {
p.logger.Errorf("podman provider does not work properly in rootless mode")
os.Exit(1)
p.logger.Warn("support for rootless mode is experimental, some features may not work")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR body suggests that it doesn't work, if that's the case then this new message seems misleading.

@BenTheElder
Copy link
Member

ociwrapper script is used to remove .linux.resources.devices from config.json, because .linux.resources.devices is meaningless on rootless and yet produces errors. Workaround until we get proper fixes in containerd and runc.

is there somewhere we can track this?

@BenTheElder
Copy link
Member

checked up on the dependent PRs:

I've updated us again to the latest containerd changes, and I am WIP on redoing the containerd config in the image, kind on ZFS needs a similar automatic "if in this mode modify the containerd config" change. #1719

@AkihiroSuda
Copy link
Member Author

is there somewhere we can track this?

opencontainers/runc#2522 is the most relevant one, but maybe we need more

I am WIP on redoing the containerd config in the image, kind on ZFS needs a similar automatic "if in this mode modify the containerd config" change.

👍

Could you open a PR?

@BenTheElder
Copy link
Member

sorry I got behind on all of this. working to catch up but the ZFS PR will be a little lower on the stack,

we actually shouldn't need most of that anyhow now that opencontainers/runc#2522 is in?

@BenTheElder BenTheElder modified the milestones: v0.9.0, v0.10.0 Aug 20, 2020
@BenTheElder
Copy link
Member

Still blocked on upstream.

  1. should not be necessary now AIUI, which reduces the complexity a lot, except for restrict_oom_score_adj, though we can perhaps do that in a simpler way similar to rework containerd config #1818

I will work on a PR related to 2) with a slightly better approach, basically instead of disabling err exit we should catch and log this without failing, and we should disable the systemd mount / udev w/o depending on this xref: #1474

  1. seems a little dubious, per review comments.

@k8s-ci-robot
Copy link
Contributor

@AkihiroSuda: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AkihiroSuda
Copy link
Member Author

Opened a new PR: #1935

The new version works with vanilla Kubernetes (1.20.0-beta.2).
However, the new version has dirty hacks to fake sysctl keys for avoiding patching Kubernetes (kubernetes/kubernetes#92863).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/docker Issues or PRs related to docker cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants