Run aws-node as unprivileged pod #796

mogren · 2020-01-17T17:42:54Z

We currently set the aws-node pod to be privileged, but that might not be needed.

Check if CAP_NET_ADMIN and CAP_DAC_OVERRIDE is enough to set the RDF check to loose and copy the binary and config file:

amazon-vpc-cni-k8s/pkg/networkutils/network.go

Lines 233 to 246 in 1ee59a0

    
           // If node port support is enabled, configure the kernel's reverse path filter check on eth0 for "loose" 
        
           // filtering. This is required because 
        
           // - NodePorts are exposed on eth0 
        
           // - The kernel's RPF check happens after incoming packets to NodePorts are DNATted to the pod IP. 
        
           // - For pods assigned to secondary ENIs, the routing table includes source-based routing. When the kernel does 
        
           //   the RPF check, it looks up the route using the pod IP as the source. 
        
           // - Thus, it finds the source-based route that leaves via the secondary ENI. 
        
           // - In "strict" mode, the RPF check fails because the return path uses a different interface to the incoming 
        
           //   packet. In "loose" mode, the check passes because some route was found. 
        
           primaryIntfRPFilter := "/proc/sys/net/ipv4/conf/" + primaryIntf + "/rp_filter" 
        
           const rpFilterLoose = "2" 
        
           log.Debugf("Setting RPF for primary interface: %s", primaryIntfRPFilter) 
        
           err = n.setProcSys(primaryIntfRPFilter, rpFilterLoose)

The text was updated successfully, but these errors were encountered:

mogren · 2020-01-20T22:24:00Z

Unfortunately, having the capabilities limited to ["NET_ADMIN", "DAC_OVERRIDE"] is not enough to write to /proc/sys/net/*. From the logs:

[DEBUG] Setting RPF for primary interface: /proc/sys/net/ipv4/conf/eth0/rp_filter
[ERROR] Failed to set up host networkfailed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: read-only file system
[ERROR] Initialization failure: ipamd init: failed to set up host network: failed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: read-only file system

SaranBalaji90 · 2020-03-23T02:55:43Z

Looks like only for privileged pod, /proc gets mounted with write access.

There are few ways to remove privileged pod permission for aws-node,

Set the rp_filter through init container.

    initContainers:
      - command:
        - sh
        - -c
        - sysctl net.ipv4.conf.eth0.rp_filter=2
        image: golang:1.13-stretch
        name: rp_filter_setting
        securityContext:
          privileged: true

Set the rp_filter when the ec2 instance starts up. But the problem with this approach is, if anyone builds custom AMI then they have to make sure this change is added to their AMI build scripts.
Enabling unsafe sysctl - https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#enabling-unsafe-sysctls but this doesn't help us, because https://github.com/kubernetes/kubernetes/blob/7f23a743e8c23ac6489340bbb34fa6f1d392db9d/pkg/kubelet/sysctl/whitelist.go#L89 kubelet rejects the pod allocation when net.* sysctl is used along with host network.

This leave us with option-1 being more suitable for our usecase.

Proposed solution:

Add a flag that instructs aws-node pod whether to perform or skip setting rp_filter value for eth0.
Add init container as described above and remove "privileged" mode for aws-node.
At later time, deprecate the flag and remove it completely from aws-node.

M00nF1sh · 2020-03-24T17:28:35Z

feel like we should use src_valid_mark instead: torvalds/linux@28f6aee

BTW, it's indeed possible to trick without any sysctl setting, like mangle traffic from eth0 with a custom tos value, and have a route policy for that tos value to use main table. but it's a bit tricky.

https://github.com/torvalds/linux/blob/v4.14/net/ipv4/route.c#L1879
https://github.com/torvalds/linux/blob/v4.14/net/ipv4/route.c#L1660
https://github.com/torvalds/linux/blob/v4.14/net/ipv4/route.c#L1679
https://github.com/torvalds/linux/blob/v4.14/net/ipv4/fib_frontend.c#L315

mogren · 2020-03-25T02:01:35Z

This PR #130 where the code was added has some good comments.

SaranBalaji90 · 2020-03-25T02:44:02Z

Nice dive depp @M00nF1sh. I tested both of your suggestions and both seems to work fine

# with TOS
sudo iptables -A PREROUTING -t mangle -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j TOS —set-tos 0x08
sudo ip rule add pref 1025 tos 0x08 table main

# with MARK
sudo iptables -A PREROUTING -t mangle -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j MARK —set-xmark 0x80/0x80
sudo sysctl net.ipv4.conf.eth0.src_valid_mark=1

Removed 'if' block from

amazon-vpc-cni-k8s/pkg/networkutils/network.go

Line 228 in 1ee59a0

if n.nodePortSupportEnabled {

and was able to get aws-node running with following security context.

securityContext:
          capabilities:
            add:
            - NET_ADMIN

anguslees · 2020-03-25T03:27:07Z

We should continue the above investigation, but for context I note we read-write mount the host's /var/log (can rewrite logs) and /var/run/docker{,shim}.sock (host root equiv) into the aws-node container. We will also need to drop/reduce these hostPath volumes somehow before privileged=false has any meaning.

SaranBalaji90 · 2020-04-01T01:54:37Z

For now, I'm thinking of checking if we have write access to net.ipv4.conf.eth0.rp_filter file then update the rp_filter otherwise don't update. With this we don't have to introduce another env variable to have users to decide whether to do this operation or not. This would simplify user experience with respect to updates (would help both variants of updates that users performs - just editing aws-node ds version number as well as applying the manifest completely)

mogren · 2020-06-10T18:09:40Z

Resolved by adding the init container in #955

anguslees · 2020-08-18T03:59:02Z

This was closed prematurely, so just reopening so we don't lose the remaining action item raised earlier.

#955 moved the literal privileged=true Kubernetes option to an earlier init container, but we still expose the CRI socket to the aws-node persistent container. This allows the aws-node container to trivially just (eg) start a new privileged container and so the aws-node pod remains "privileged" in every practical sense.

Remaining action item:

Remove CRI socket from aws-node container (or equivalent docker/containerd socket)

(For tracking: Write access to all of /var/log was removed in #987, and the docker socket was removed in #1075)

TBBle · 2020-12-05T07:12:54Z

Per aws/containers-roadmap#1048 (comment), the aws-node Pod needs NET_RAW as well as NET_ADMIN. It's currently undeclared in the Daemonset, but because it's one of the default capabilities added by the Docker runtime, its absence is not noticed until a PodSecurityPolicy tries to take it away.

The symptom observed is:

daemonset pods get scheduled but fail. On top of that the aws-k8s-agent fails out silently.

jayanthvn · 2021-01-21T21:44:28Z

#1352 will remove CRI socket read from aws-node. This will be merged to v1.8 release.

csuzhangxc · 2021-11-09T09:14:40Z

#1352 will remove CRI socket read from aws-node. This will be merged to v1.8 release.

this PR is still not merged after v1.9.3 was released.

github-actions · 2022-06-18T00:16:16Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

TBBle · 2022-06-19T07:51:15Z

#1352 was closed unmerged

Closing this PR since this will be handled as part of v1.11.0 release.

but I don't see any change for the CRI socket read being removed (i.e. bumping checkpointMigrationPhase to 2) in that release. I do see #1958 which looks like it was preparing to support not reading from the CRI socket, but did not itself make that change.

Since 1.11.0 added to the data being stored in the datastore file, I guess this can't be advanced until at least 1.12.0 since users will need to upgrade to a 1.11.x version in order to have the right data in the file before upgrading to a phase 2 release in order to deliver the no-reboot upgrade path.

github-actions · 2022-09-21T17:03:31Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

TBBle · 2022-09-22T07:42:29Z

#1352 lives again.

jayanthvn · 2022-11-11T15:32:54Z

Moved to checkpoint migration 2 and switched to use state file instead of CRI socket for IP allocation pool restore - #2110. This is as part of 1.12.0 release - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.12.0.

github-actions · 2023-01-11T00:03:04Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

jdn5126 · 2023-01-12T20:04:19Z

Closing this as init container must run with privileged access, while main container does not

github-actions · 2023-01-12T20:04:50Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

mogren added the enhancement label Jan 17, 2020

mogren closed this as completed Jan 20, 2020

SaranBalaji90 changed the title ~~Try to limit permissions on the ipamd pod~~ Move rp_filter setting outside aws-node and run aws-node as unprivileged pod Mar 23, 2020

SaranBalaji90 changed the title ~~Move rp_filter setting outside aws-node and run aws-node as unprivileged pod~~ Run aws-node as unprivileged pod Mar 23, 2020

SaranBalaji90 reopened this Mar 23, 2020

SaranBalaji90 mentioned this issue Apr 6, 2020

Update rp_filter only when /proc is write accessible #901

Merged

mogren closed this as completed Jun 10, 2020

anguslees reopened this Aug 18, 2020

TBBle mentioned this issue Sep 13, 2020

[EKS] [request]: Provide proper PodSecurityPolicy's for AWS provided cluster components aws/containers-roadmap#1048

Open

jayanthvn added the 2.x CNI plugin Features and issues to address in 2.x CNI plugin label Nov 4, 2020

jayanthvn self-assigned this Dec 16, 2020

jayanthvn added this to the v1.8.0 milestone Jan 27, 2021

jayanthvn assigned couralex6 Jan 27, 2021

jayanthvn removed this from the v1.8.0 milestone Apr 21, 2021

jayanthvn assigned haouc and unassigned jayanthvn and couralex6 Jun 10, 2021

jayanthvn added this to the v1.11.0 milestone Feb 25, 2022

jayanthvn assigned M00nF1sh and unassigned haouc Feb 25, 2022

jayanthvn removed this from the v1.11.0 milestone Apr 18, 2022

github-actions bot added the stale Issue or PR is stale label Jun 18, 2022

github-actions bot removed the stale Issue or PR is stale label Jun 20, 2022

github-actions bot added the stale Issue or PR is stale label Sep 21, 2022

github-actions bot removed the stale Issue or PR is stale label Sep 23, 2022

jdn5126 mentioned this issue Nov 29, 2022

VPC-CNI minimal image builds #2146

Merged

github-actions bot added the stale Issue or PR is stale label Jan 11, 2023

M00nF1sh removed the stale Issue or PR is stale label Jan 12, 2023

jdn5126 closed this as completed Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run aws-node as unprivileged pod #796

Run aws-node as unprivileged pod #796

mogren commented Jan 17, 2020 •

edited

Loading

mogren commented Jan 20, 2020

SaranBalaji90 commented Mar 23, 2020 •

edited

Loading

M00nF1sh commented Mar 24, 2020 •

edited by SaranBalaji90

Loading

mogren commented Mar 25, 2020

SaranBalaji90 commented Mar 25, 2020 •

edited

Loading

anguslees commented Mar 25, 2020 •

edited

Loading

SaranBalaji90 commented Apr 1, 2020 •

edited

Loading

mogren commented Jun 10, 2020

anguslees commented Aug 18, 2020 •

edited by couralex6

Loading

TBBle commented Dec 5, 2020 •

edited

Loading

jayanthvn commented Jan 21, 2021

csuzhangxc commented Nov 9, 2021 •

edited

Loading

github-actions bot commented Jun 18, 2022

TBBle commented Jun 19, 2022 •

edited

Loading

github-actions bot commented Sep 21, 2022

TBBle commented Sep 22, 2022

jayanthvn commented Nov 11, 2022

github-actions bot commented Jan 11, 2023

jdn5126 commented Jan 12, 2023

github-actions bot commented Jan 12, 2023

Run aws-node as unprivileged pod #796

Run aws-node as unprivileged pod #796

Comments

mogren commented Jan 17, 2020 • edited Loading

mogren commented Jan 20, 2020

SaranBalaji90 commented Mar 23, 2020 • edited Loading

M00nF1sh commented Mar 24, 2020 • edited by SaranBalaji90 Loading

mogren commented Mar 25, 2020

SaranBalaji90 commented Mar 25, 2020 • edited Loading

anguslees commented Mar 25, 2020 • edited Loading

SaranBalaji90 commented Apr 1, 2020 • edited Loading

mogren commented Jun 10, 2020

anguslees commented Aug 18, 2020 • edited by couralex6 Loading

TBBle commented Dec 5, 2020 • edited Loading

jayanthvn commented Jan 21, 2021

csuzhangxc commented Nov 9, 2021 • edited Loading

github-actions bot commented Jun 18, 2022

TBBle commented Jun 19, 2022 • edited Loading

github-actions bot commented Sep 21, 2022

TBBle commented Sep 22, 2022

jayanthvn commented Nov 11, 2022

github-actions bot commented Jan 11, 2023

jdn5126 commented Jan 12, 2023

github-actions bot commented Jan 12, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

mogren commented Jan 17, 2020 •

edited

Loading

SaranBalaji90 commented Mar 23, 2020 •

edited

Loading

M00nF1sh commented Mar 24, 2020 •

edited by SaranBalaji90

Loading

SaranBalaji90 commented Mar 25, 2020 •

edited

Loading

anguslees commented Mar 25, 2020 •

edited

Loading

SaranBalaji90 commented Apr 1, 2020 •

edited

Loading

anguslees commented Aug 18, 2020 •

edited by couralex6

Loading

TBBle commented Dec 5, 2020 •

edited

Loading

csuzhangxc commented Nov 9, 2021 •

edited

Loading

TBBle commented Jun 19, 2022 •

edited

Loading