Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

Open
spantaleev opened this issue Jul 8, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@spantaleev
Copy link
Contributor

What happened?

On some of my nodes, coredns Pods (currently using the v1.11.1container image) fail to start with an error:

Listen: listen tcp :53: bind: permission denied

On others, it runs fine.

As far as I could tell, all my nodes are identical (same OS, same kernel version, same containerd version, same sysctl parameter for net.ipv4.ip_unprivileged_port_start = 1024).

I am not sure why binding on privileged ports works as a non-root user on some nodes and not on others.

What did you expect to happen?

I would expect that coredns would reliably run on all my cluster's nodes.

How can we reproduce it (as minimally and precisely as possible)?

Since my Kuberspray config yields working & non-working nodes, I was trying to reproduce the issue in another way.

I've used the following Corefile (inspired by the coredns config map but with the kubernetes plugin disabled):

.:53 {
    errors {
    }
    health {
        lameduck 5s
    }
    ready

    # Disable Kubernetes plugin, as we'll run in a non-Kubernetes context for testing purposes.
    #kubernetes cluster.local in-addr.arpa ip6.arpa {
    #  pods insecure
    #  fallthrough in-addr.arpa ip6.arpa
    #}

    prometheus :9153
    forward . 8.8.8.8 8.8.4.4 {
        prefer_udp
        max_concurrent 1000
    }
    cache 30

    loop
    reload
    loadbalance
}

and I try to run this with:

nerdctl run \
-it \
--rm \
--network=none \
--mount type=bind,src=$(pwd)/Corefile,dst=/etc/coredns/Corefile,ro \
--cap-add=NET_BIND_SERVICE \
registry.k8s.io/coredns/coredns:v1.11.1 \
-conf /etc/coredns/Corefile

On some nodes it works, on others I get the aforementioned error.

It appears that NET_BIND_SERVICE does not do anything.

Workarounds:

  • adding --sysctl net.ipv4.ip_unprivileged_port_start=0 to the nerdctl run command

    • I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext.sysctls
  • adding --user=0:0 to the nerdctl run command

    • I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext
  • adjusting the Corefile configuration to use a port higher than 1023

  • using an older version of coredns (older than v1.11.0), like v1.10.1

As this comment states, coredns was made to run as non-root user since v1.11.0.

It appears that Kubespray sets up the coredns Deployment to run as the default user and does not explicitly adjust sysctl for net.ipv4.ip_unprivileged_port_start. It also doesn't provide much control of the securityContext, so applying any of these workarounds is difficult.

It would probably be good if one of these workarounds is applied by default.

OS

Linux 5.15.0-113-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

Irrelevant

Version of Python

Irrelevant

Version of Kubespray (commit)

v2.25.0

Network plugin used

cilium

Full inventory with variables

My configuration is not customized much - using the containerd runtime, etc.

Command used to invoke ansible

Irrelevant

Output of ansible run

Ansible run is all good

Anything else we need to know

No response

@spantaleev spantaleev added the kind/bug Categorizes issue or PR as related to a bug. label Jul 8, 2024
@spantaleev
Copy link
Contributor Author

For now, I work around this issue by pinning coredns to an older version (older than v1.11.0 which landed support for running as non-root here coredns/coredns#5969).

These older coredns versions still run as root by default, so binding to privileged ports works reliably on all my nodes.

coredns_version: v1.10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant