Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md Updated. #2021

Merged
merged 1 commit into from
Jul 5, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Networking plugin for pod networking in [Kubernetes](https://kubernetes.io/) usi

## Setup

Download the latest version of the [yaml](./config/) and apply it the cluster.
Download the latest version of the [yaml](./config/) and apply it to the cluster.

```
kubectl apply -f aws-k8s-cni.yaml
Expand Down Expand Up @@ -61,7 +61,7 @@ When a worker node first joins the cluster, there is only 1 ENI along with all o
configuration, ipamd always tries to keep one extra ENI.

When the number of pods running on the node exceeds the number of addresses on a single ENI, the CNI backend starts allocating
a new ENI using following allocation scheme:
a new ENI using the following allocation scheme:

* If the number of current running Pods is between 0 and 29, ipamd will allocate one more eni. And Warm-Pool size is 2 eni * (30 -1) = 58
* If the number of current running Pods is between 30 and 58, ipamd will allocate 2 more eni. And Warm-Pool size is 3 eni * (30 -1) = 87
Expand All @@ -85,7 +85,7 @@ Type: Boolean as a String
Default: `true`

Specifies whether `NodePort` services are enabled on a worker node's primary network interface\. This requires additional
`iptables` rules, and that the kernel's reverse path filter on the primary interface is set to `loose`.
`iptables` rules, and the kernel's reverse path filter on the primary interface is set to `loose`.

---

Expand Down Expand Up @@ -182,7 +182,7 @@ Type: String

Default: empty

Specify a comma separated list of IPv4 CIDRs to exclude from SNAT. For every item in the list an `iptables` rule and off\-VPC
Specify a comma-separated list of IPv4 CIDRs to exclude from SNAT. For every item in the list an `iptables` rule and off\-VPC
IP rule will be applied. If an item is not a valid ipv4 range it will be skipped. This should be used when `AWS_VPC_K8S_CNI_EXTERNALSNAT=false`.

---
Expand Down Expand Up @@ -212,7 +212,7 @@ Type: Integer

Default: None

Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node. Setting this to a non-positive value is same as setting this to 0 or not setting the variable.
Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node. Setting this to a non-positive value is the same as setting this to 0 or not setting the variable.
With `ENABLE_PREFIX_DELEGATION` set to `true` then `ipamd` daemon will check if the existing (/28) prefixes are enough to maintain the
`WARM_IP_TARGET` if it is not sufficient then more prefixes will be attached.

Expand All @@ -221,7 +221,7 @@ For example,
1. if `WARM_IP_TARGET` is set to 5, then `ipamd` attempts to keep 5 free IP addresses available at all times. If the
elastic network interfaces on the node are unable to provide these free addresses, `ipamd` attempts to allocate more interfaces
until `WARM_IP_TARGET` free IP addresses are available.
2. `ENABLE_PREFIX_DELEGATION` set to `true` and `WARM_IP_TARGET` is 16. Initially 1 (/28) prefix is sufficient but once a single pod is assigned IP then
2. `ENABLE_PREFIX_DELEGATION` set to `true` and `WARM_IP_TARGET` is 16. Initially, 1 (/28) prefix is sufficient but once a single pod is assigned IP then
remaining free IPs are 15 hence IPAMD will allocate 1 more prefix to achieve 16 `WARM_IP_TARGET`

**NOTE!** Avoid this setting for large clusters, or if the cluster has high pod churn. Setting it will cause additional calls to the
Expand Down Expand Up @@ -254,7 +254,7 @@ a total of 60, accelerating IP exhaustion in the relevant subnets. If instead `M
`WARM_IP_TARGET` to 2, after the 30 pods are deployed the CNI would allocate an additional 2 IPs. This still provides
elasticity, but uses roughly half as many IPs as using WARM_IP_TARGET alone (32 IPs vs 60 IPs).

This also improves reliability of the EKS cluster by reducing the number of calls necessary to allocate or deallocate
This also improves the reliability of the EKS cluster by reducing the number of calls necessary to allocate or deallocate
private IPs, which may be throttled, especially at scaling-related times.

---
Expand Down Expand Up @@ -362,7 +362,7 @@ Type: String

Default: `eni`

Specifies the veth prefix used to generate the host-side veth device name for the CNI. The prefix can be at most 4 characters long. The prefixes `eth`, `vlan` and `lo` are reserved by the CNI plugin and cannot be specified. We recommend using prefix name not shared by any other network interfaces on the worker node instance.
Specifies the veth prefix used to generate the host-side veth device name for the CNI. The prefix can be at most 4 characters long. The prefixes `eth`, `vlan`, and `lo` are reserved by the CNI plugin and cannot be specified. We recommend using prefix name not shared by any other network interfaces on the worker node instance.

---

Expand All @@ -374,12 +374,12 @@ Default: `{}`

Example values: `{"tag_key": "tag_val"}`

Metadata applied to ENI help you categorize and organize your resources for billing or other purposes. Each tag consists of a
Metadata applied to ENI helps you categorize and organize your resources for billing or other purposes. Each tag consists of a
custom-defined key and an optional value. Tag keys can have a maximum character length of 128 characters. Tag values can have
a maximum length of 256 characters. These tags will be added to all ENIs on the host.

Important: Custom tags should not contain `k8s.amazonaws.com` prefix as it is reserved. If the tag has `k8s.amazonaws.com`
string, tag addition will ignored.
string, tag addition will be ignored.

---

Expand Down Expand Up @@ -411,16 +411,16 @@ Default: `false`

To enable security groups for pods you need to have at least an EKS 1.17 eks.3 cluster.

Setting `ENABLE_POD_ENI` to `true` will allow IPAMD to add the `vpc.amazonaws.com/has-trunk-attached` label to the node if the instance has capacity to attach an additional ENI.
Setting `ENABLE_POD_ENI` to `true` will allow IPAMD to add the `vpc.amazonaws.com/has-trunk-attached` label to the node if the instance has the capacity to attach an additional ENI.

The label notifies vpc-resource-controller (https://github.com/aws/amazon-vpc-resource-controller-k8s) to attach a Trunk ENI to the instance. The label value is initially set to `false` and is marked to `true` by IPAMD when vpc-resource-controller attaches a Trunk ENI to the instance. However, there might be cases where the label value will remain `false` if the instance doesn't support ENI Trunking.

Once enabled the VPC resource controller will then advertise branch network interfaces as extended resources on these nodes in your cluster. Branch interface capacity is additive to existing instance type limits for secondary IP addresses and prefixes. For example, a c5.4xlarge can continue to have up to 234 secondary IP addresses or 234 /28 prefixes assigned to standard network interfaces and up to 54 branch network interfaces. Each branch network interface only receives a single primary IP address and this IP address will be allocated to pods with security group(branch ENI pods).
Once enabled the VPC resource controller will then advertise branch network interfaces as extended resources on these nodes in your cluster. Branch interface capacity is additive to existing instance type limits for secondary IP addresses and prefixes. For example, a c5.4xlarge can continue to have up to 234 secondary IP addresses or 234 /28 prefixes assigned to standard network interfaces and up to 54 branch network interfaces. Each branch network interface only receives a single primary IP address and this IP address will be allocated to pods with a security group(branch ENI pods).

Any of the WARM targets do not impact the scale of the branch ENI pods so you will have to set the WARM_{ENI/IP/PREFIX}_TARGET based on the number of non-branch ENI pods. If you are having the cluster mostly using pods with security group consider setting WARM_IP_TARGET to a very low value instead of default WARM_ENI_TARGET or WARM_PREFIX_TARGET to reduce wastage of IPs/ENIs.
Any of the WARM targets do not impact the scale of the branch ENI pods so you will have to set the WARM_{ENI/IP/PREFIX}_TARGET based on the number of non-branch ENI pods. If you are having the cluster mostly using pods with a security group consider setting WARM_IP_TARGET to a very low value instead of default WARM_ENI_TARGET or WARM_PREFIX_TARGET to reduce wastage of IPs/ENIs.


**NOTE!** Toggling `ENABLE_POD_ENI` from `true` to `false` will not detach the Trunk ENI from instance. To delete/detach the Trunk ENI from instance, you need recycle the instance.
**NOTE!** Toggling `ENABLE_POD_ENI` from `true` to `false` will not detach the Trunk ENI from an instance. To delete/detach the Trunk ENI from an instance, you need to recycle the instance.


---
Expand All @@ -433,19 +433,19 @@ Default: `strict`

Valid Values: `strict`, `standard`

Once `ENABLE_POD_ENI` is set to `true`, this value controls how the traffic of pods with security group behaves.
Once `ENABLE_POD_ENI` is set to `true`, this value controls how the traffic of pods with the security group behaves.

* `strict` mode: all inbound/outbound traffic from pod with security group will be enforced by security group rules. This is the **default** mode if POD_SECURITY_GROUP_ENFORCING_MODE is not set.

* `standard` mode: the traffic of pod with security group behaves same as pods without security group, except that each pod occupies a dedicated branch ENI.
* `standard` mode: the traffic of pod with security group behaves same as pods without a security group, except that each pod occupies a dedicated branch ENI.
* inbound traffic to pod with security group from another host will be enforced by security group rules.
* outbound traffic from pod with security group to another host in same VPC will be enforced by security group rules.
* inbound/outbound traffic from another pod on same host or another service on same host(such as kubelet/nodeLocalDNS) won't be enforced by security group rules.
* outbound traffic from pod with security group to another host in the same VPC will be enforced by security group rules.
* inbound/outbound traffic from another pod on the same host or another service on the same host(such as kubelet/nodeLocalDNS) won't be enforced by security group rules.
* outbound traffic from pod with security group to IP address outside VPC
* if externalSNAT enabled, traffic won't be SNATed, thus will be enforced by security group rules.
* if externalSNAT disabled, traffic will be SNATed via eth0, thus will only be enforced by security group associated with eth0.
* if externalSNAT disabled, traffic will be SNATed via eth0, thus will only be enforced by the security group associated with eth0.

**NOTE!**: To make new behavior be in effect after switching the mode, existing pods with security group must be recycled. Alternatively you can restart the nodes as well.
**NOTE!**: To make new behavior be in effect after switching the mode, existing pods with security group must be recycled. Alternatively, you can restart the nodes as well.

---

Expand All @@ -455,9 +455,9 @@ Type: Boolean as a String

Default: `false`

If `ENABLE_POD_ENI` is set to `true`, in order for the kubelet to connect via TCP (for liveness or readiness probes)
If `ENABLE_POD_ENI` is set to `true`, for the kubelet to connect via TCP (for liveness or readiness probes)
to pods that are using per pod security groups, `DISABLE_TCP_EARLY_DEMUX` should be set to `true` for `amazon-k8s-cni-init`
container under `initcontainers`. This will increase the local TCP connection latency slightly.
the container under `initcontainers`. This will increase the local TCP connection latency slightly.
Details on why this is needed can be found in this [#1212 comment](https://github.com/aws/amazon-vpc-cni-k8s/pull/1212#issuecomment-693540666).
To use this setting, a Linux kernel version of at least 4.6 is needed on the worker node.

Expand All @@ -479,7 +479,7 @@ To enable prefix delegation on nitro instances. Setting `ENABLE_PREFIX_DELEGATIO
and /80 for IPv6) instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the
limit on private IPs allowed by your instance. Setting or resetting of `ENABLE_PREFIX_DELEGATION` while pods are running or if ENIs are attached is supported and the new pods allocated will get IPs based on the mode of IPAMD but the max pods of kubelet should be updated which would need either kubelet restart or node recycle.

Setting ENABLE_PREFIX_DELEGATION to true will not increase the density of branch ENI pods. The limit on number of branch network interfaces per instance type will remain the same - https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html#supported-instance-types. Each branch network will be allocated a primary IP and this IP will be allocated for the branch ENI pods.
Setting ENABLE_PREFIX_DELEGATION to true will not increase the density of branch ENI pods. The limit on the number of branch network interfaces per instance type will remain the same - https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html#supported-instance-types. Each branch network will be allocated a primary IP and this IP will be allocated for the branch ENI pods.

Please refer to [VPC CNI Feature Matrix](https://github.com/aws/amazon-vpc-cni-k8s#vpc-cni-feature-matrix) section below for additional information around using Prefix delegation with Custom Networking and Security Groups Per Pod features.

Expand All @@ -494,7 +494,7 @@ Type: Integer
Default: None

Specifies the number of free IPv4(/28) prefixes that the `ipamd` daemon should attempt to keep available for pod assignment on the node. Setting to a non-positive value is same as setting this to 0 or not setting the variable.
This environment variable works when `ENABLE_PREFIX_DELEGATION` is set to `true` and is overriden when `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are configured.
This environment variable works when `ENABLE_PREFIX_DELEGATION` is set to `true` and is overridden when `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are configured.

---

Expand All @@ -504,7 +504,7 @@ Type: Boolean as a String

Default: `false`

Setting `DISABLE_NETWORK_RESOURCE_PROVISIONING` to `true` will make IPAMD to depend only on IMDS to get attached ENIs and IPs/prefixes.
Setting `DISABLE_NETWORK_RESOURCE_PROVISIONING` to `true` will make IPAMD depend only on IMDS to get attached ENIs and IPs/prefixes.

---

Expand Down Expand Up @@ -554,7 +554,7 @@ Default: `true`

VPC CNI can operate in either IPv4 or IPv6 mode. Setting `ENABLE_IPv4` to `true` will configure it in IPv4 mode (default mode).

**Note:** Dual stack mode isn't yet supported. So, enabling both IPv4 and IPv6 will be treated as invalid configuration.
**Note:** Dual-stack mode isn't yet supported. So, enabling both IPv4 and IPv6 will be treated as an invalid configuration.

---

Expand All @@ -565,7 +565,7 @@ Type: Boolean as a String
Default: `false`

VPC CNI can operate in either IPv4 or IPv6 mode. Setting `ENABLE_IPv6` to `true` (both under `aws-node` and `aws-vpc-cni-init` containers in the manifest)
will configure it in IPv6 mode. IPv6 is only supported in Prefix Delegation mode, so `ENABLE_PREFIX_DELEGATION` needs to set to `true` if VPC CNI is
will configure it in IPv6 mode. IPv6 is only supported in Prefix Delegation mode, so `ENABLE_PREFIX_DELEGATION` needs to be set to `true` if VPC CNI is
configured to operate in IPv6 mode. Prefix delegation is only supported on nitro instances.


Expand Down Expand Up @@ -615,7 +615,7 @@ and the kubelet respectively if you are making use of this tag.

### Container Runtime

Currently IPAMD uses dockershim socket to pull pod sandboxes information upon its starting. The runtime can be set to others.
Currently, IPAMD uses dockershim socket to pull pod sandboxes information upon its starting. The runtime can be set to others.
The mountPath should be changed to `/var/run/cri.sock` and hostPath should be pointed to the wanted socket, such as
`/var/run/containerd/containerd.sock` for containerd. If using helm chart, the flag `--set cri.hostPath.path=/var/run/containerd/containerd.sock`
can set the paths for you.
Expand All @@ -627,21 +627,21 @@ can set the paths for you.

### Notes

`L-IPAMD`(aws-node daemonSet) running on every worker node requires access to kubernetes API server. If it can **not** reach
kubernetes API server, ipamd will exit and CNI will not be able to get any IP address for Pods. Here is a way to confirm if
`aws-node` has access to the kubernetes API server.
`L-IPAMD`(aws-node daemonSet) running on every worker node requires access to the Kubernetes API server. If it can **not** reach
the Kubernetes API server, ipamd will exit and CNI will not be able to get any IP address for Pods. Here is a way to confirm if
`aws-node` has access to the Kubernetes API server.

```
# find out kubernetes service IP, e.g. 10.0.0.1
kubectl get svc kubernetes
# find out Kubernetes service IP, e.g. 10.0.0.1
kubectl get svc Kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 29d

# ssh into worker node, check if worker node can reach API server
telnet 10.0.0.1 443
Trying 10.0.0.1...
Connected to 10.0.0.1.
Escape character is '^]'. <-------- kubernetes API server is reachable
Escape character is '^]'. <-------- Kubernetes API server is reachable
```

## Security disclosures
Expand Down