Skip to content

Commit

Permalink
Fetch Region and CLUSTER_ID information from cni-metrics-helper env v…
Browse files Browse the repository at this point in the history
…ariables if available

removed unnecessary logs

Update failing test

Updated ClusterRole permissions

Rename mType to metricType
Fetch Region only if not available

Remove redundant logging

helm chart changes to use the new AWS_CLUSTER_ID env variable

Minor fixes to fetching region and cluster_id logic

Simply logic to fetch cluster_id and region

Updated cni-metrics-helper Readme with instructions for using IRSA

Updated clusterRole template for cni-metrics-helper helm chart

Manifests and Readme updates (aws#1732)

* Manifests and Readme updates

* update manifest.jsonnet

Readme updates (aws#1735)

Updates to troubleshooting doc (aws#1737)

* Updates to troubleshooting doc

* updates to troubleshooting doc

imdsv2 changes (aws#1743)

fix flaky canary test (aws#1742)

add CODEOWNERS (aws#1747)

Snat tests: [agent is already updated] (aws#1513)

* resolved conflicts with go.sum

* Updated test agent image

* Removed redundant files

* Addressed PR comments

Fixed go.sum in root folder

Changed DescribeInstanceWithFilter to DescribeInstances
Moved GetPrimaryInstanceId from ec2 interface
Added GinkgoWriter

Updated Readme for Snat test

Rearranged snat_test logic
Updated Readme for test/e2e

* Minor change to logging

Updated Chart version for cni-metrics-helper
  • Loading branch information
cgchinmay committed Nov 12, 2021
1 parent 6a15a84 commit f2f7709
Show file tree
Hide file tree
Showing 38 changed files with 805 additions and 387 deletions.
1 change: 1 addition & 0 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @aws/eks-networking
79 changes: 72 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ scheduling that exceeds the IP address resources available to the kubelet.

The default manifest expects `--cni-conf-dir=/etc/cni/net.d` and `--cni-bin-dir=/opt/cni/bin`.

L-IPAM requires following [IAM policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html):
## IAM Policy

L-IPAM requires one of the following [IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) depending on the IP Family configured:

**IPv4 Mode:**

```
{
Expand Down Expand Up @@ -56,6 +60,31 @@ L-IPAM requires following [IAM policy](https://docs.aws.amazon.com/IAM/latest/Us
}
```

**IPv6 Mode:**

```
{
"Effect": "Allow",
"Action": [
"ec2:AssignIpv6Addresses",
"ec2:DescribeInstances",
"ec2:DescribeTags",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeInstanceTypes"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags"
],
"Resource": [
"arn:aws:ec2:*:*:network-interface/*"
]
}
```

Alternatively there is also a [Helm](https://helm.sh/) chart: [eks/aws-vpc-cni](https://github.com/aws/eks-charts/tree/master/stable/aws-vpc-cni)

## Building
Expand Down Expand Up @@ -474,14 +503,16 @@ Type: Boolean as a String

Default: `false`

To enable IPv4 prefix delegation on nitro instances. Setting `ENABLE_PREFIX_DELEGATION` to `true` will start allocating a /28 prefix
instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the
To enable prefix delegation on nitro instances. Setting `ENABLE_PREFIX_DELEGATION` to `true` will start allocating a prefix (/28 for IPv4
and /80 for IPv6) instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the
limit on private IPs allowed by your instance. Setting or resetting of `ENABLE_PREFIX_DELEGATION` while pods are running or if ENIs are attached is supported and the new pods allocated will get IPs based on the mode of IPAMD but the max pods of kubelet should be updated which would need either kubelet restart or node recycle.

Custom networking and Security group per pods are supported with this feature.

Setting ENABLE_PREFIX_DELEGATION to true will not increase the density of branch ENI pods. The limit on number of branch network interfaces per instance type will remain the same - https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html#supported-instance-types. Each branch network will be allocated a primary IP and this IP will be allocated for the branch ENI pods.

Please refer to [VPC CNI Feature Matrix](https://github.com/aws/amazon-vpc-cni-k8s#vpc-cni-feature-matrix) section below for additional information around using Prefix delegation with Custom Networking and Security Groups Per Pod features.

**Note:** `ENABLE_PREFIX_DELEGATION` needs to be set to `true` when VPC CNI is configured to operate in IPv6 mode (supported in v1.10.0+).

---

#### `WARM_PREFIX_TARGET` (v1.9.0+)
Expand Down Expand Up @@ -522,10 +553,10 @@ Type: Boolean as a String
Default: `false`

Setting `ANNOTATE_POD_IP` to `true` will allow IPAMD to add an annotation `vpc.amazonaws.com/pod-ips` to the pod with pod IP.

There is a known [issue](https://github.com/kubernetes/kubernetes/issues/39113) with kubelet taking time to update `Pod.Status.PodIP` leading to calico being blocked on programming the policy. Setting `ANNOTATE_POD_IP` to `true` will enable AWS VPC CNI plugin to add Pod IP as an annotation to the pod spec to address this race condition.

To annotate the pod with pod IP, you will have to add "patch" permission for pods resource in aws-node clusterrole. You can use the below command -
To annotate the pod with pod IP, you will have to add "patch" permission for pods resource in aws-node clusterrole. You can use the below command -

```
cat << EOF > append.yaml
Expand All @@ -543,6 +574,40 @@ kubectl apply -f <(cat <(kubectl get clusterrole aws-node -o yaml) append.yaml)
```
---

#### `ENABLE_IPv4` (v1.10.0+)

Type: Boolean as a String

Default: `true`

VPC CNI can operate in either IPv4 or IPv6 mode. Setting `ENABLE_IPv4` to `true` will configure it in IPv4 mode (default mode).

**Note:** Dual stack mode isn't yet supported. So, enabling both IPv4 and IPv6 will be treated as invalid configuration.

---

#### `ENABLE_IPv6` (v1.10.0+)

Type: Boolean as a String

Default: `false`

VPC CNI can operate in either IPv4 or IPv6 mode. Setting `ENABLE_IPv6` to `true` (both under `aws-node` and `aws-vpc-cni-init` containers in the manifest)
will configure it in IPv6 mode. IPv6 is only supported in Prefix Delegation mode, so `ENABLE_PREFIX_DELEGATION` needs to set to `true` if VPC CNI is
configured to operate in IPv6 mode. Prefix delegation is only supported on nitro instances.


**Note:** Please make sure that the required IPv6 IAM policy is applied (Refer to [IAM Policy](https://github.com/aws/amazon-vpc-cni-k8s#iam-policy) section above). Dual stack mode isn't yet supported. So, enabling both IPv4 and IPv6 will be treated as invalid configuration. Please refer to the [VPC CNI Feature Matrix](https://github.com/aws/amazon-vpc-cni-k8s#vpc-cni-feature-matrix) section below for additional information.

---

### VPC CNI Feature Matrix

IP Mode | Secondary IP Mode | Prefix Delegation | Security Groups Per Pod | WARM & MIN IP/Prefix Targets | External SNAT
------ | ------ | ------ | ------ | ------ | ------
`IPv4` | Yes| Yes | Yes | Yes | Yes | Yes
`IPv6` | No | Yes | No | No | No | No

### ENI tags related to Allocation

This plugin interacts with the following tags on ENIs:
Expand Down
4 changes: 2 additions & 2 deletions charts/aws-vpc-cni/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v1
name: aws-vpc-cni
version: 1.1.10
appVersion: "v1.9.3"
version: 1.1.11
appVersion: "v1.10.0"
description: A Helm chart for the AWS VPC CNI
icon: https://raw.githubusercontent.com/aws/eks-charts/master/docs/logo/aws.png
home: https://github.com/aws/amazon-vpc-cni-k8s
Expand Down
5 changes: 3 additions & 2 deletions charts/aws-vpc-cni/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nameOverride: aws-node

init:
image:
tag: v1.9.3
tag: v1.10.0
region: us-west-2
account: "602401143452"
pullPolicy: Always
Expand All @@ -17,12 +17,13 @@ init:
# override: "repo/org/image:tag"
env:
DISABLE_TCP_EARLY_DEMUX: "false"
ENABLE_IPv6: "false"
securityContext:
privileged: true

image:
region: us-west-2
tag: v1.9.3
tag: v1.10.0
account: "602401143452"
domain: "amazonaws.com"
pullPolicy: Always
Expand Down
4 changes: 2 additions & 2 deletions charts/cni-metrics-helper/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.4
version: 0.1.6

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: v1.9.3
appVersion: v1.10.1
30 changes: 1 addition & 29 deletions charts/cni-metrics-helper/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,6 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
3 changes: 2 additions & 1 deletion charts/cni-metrics-helper/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ nameOverride: cni-metrics-helper

image:
region: us-west-2
tag: v1.9.3
tag: v1.10.1
account: "602401143452"
domain: "amazonaws.com"
# Set to use custom image
# override: "repo/org/image:tag"

env:
USE_CLOUDWATCH: "true"
AWS_CLUSTER_ID: ""

fullnameOverride: "cni-metrics-helper"

Expand Down
88 changes: 88 additions & 0 deletions cmd/cni-metrics-helper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,94 @@ The following diagram shows how `cni-metrics-helper` works in a cluster:

![](../../docs/images/cni-metrics-helper.png)

### Using IRSA
As per [AWS EKS Security Best Practice](https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html), if you are using IRSA for pods then following requirements must be satisfied to succesfully publish metrics to CloudWatch

1. The IAM Role for your SA must have following policy attached

```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
}
]
}
```

2. You should have following ClusterRole and ClusterRoleBinding for the IRSA

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cni-metrics-helper
rules:
- apiGroups: [""]
resources:
- pods
- pods/proxy
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cni-metrics-helper
labels:
app.kubernetes.io/name: cni-metrics-helper
app.kubernetes.io/instance: cni-metrics-helper
app.kubernetes.io/version: "v1.9.3"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cni-metrics-helper
subjects:
- kind: ServiceAccount
name: <IRSA name>
namespace: kube-system
```

3. Specify this IRSA in the cni-metrics-helper deployment spec alongwith CLUSTER_ID as the metric dimension

```
kind: Deployment
apiVersion: apps/v1
metadata:
name: cni-metrics-helper
namespace: kube-system
labels:
k8s-app: cni-metrics-helper
spec:
selector:
matchLabels:
k8s-app: cni-metrics-helper
template:
metadata:
labels:
k8s-app: cni-metrics-helper
spec:
containers:
- env:
- name: USE_CLOUDWATCH
value: "true"
- name: CLUSTER_ID
value: "demo-cluster"
name: cni-metrics-helper
image: <image>
serviceAccountName: <IRSA name>
```
With IRSA, the above deployment spec will be auto-injected with AWS_REGION parameter and it will be used to fetch Region information.
Possible Scenarios for above configuration
1. If you are not using IRSA, then Region and CLUSTER_ID will be fetched using IMDS (should have access)
2. If you are using IRSA but have not specified CLUSTER_ID, we can still get this information if IMDS access is not blocked
3. If you have blocked IMDS access, then you must specify a value for CLUSTER_ID (metric dimension) in the deployment spec
4. If you have not blocked IMDS access but have specified CLUSTER_ID value, then it will be used.

### Installing the cni-metrics-helper
```
kubectl apply -f v1.6/cni-metrics-helper.yaml
Expand Down
16 changes: 15 additions & 1 deletion cmd/cni-metrics-helper/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,23 @@ func main() {
}
}

// Fetch region, if using IRSA it be will auto injected as env variable in pod spec
// If not found then it will be empty, in which case we will try to fetch it from IMDS (existing approach)
// This can also mean that Cx is not using IRSA and we shouldn't enforce IRSA requirement
region, _ := os.LookupEnv("AWS_REGION")

// should be name/identifier for the cluster if specified
clusterID, _ := os.LookupEnv("AWS_CLUSTER_ID")

log.Infof("Using REGION=%s and CLUSTER_ID=%s", region, clusterID)

log.Infof("Starting CNIMetricsHelper. Sending metrics to CloudWatch: %v, LogLevel %s", options.submitCW, logConfig.LogLevel)

clientSet, err := k8sapi.GetKubeClientSet()
if err != nil {
log.Fatalf("Error Fetching Kubernetes Client: %s", err)
os.Exit(1)
}

rawK8SClient, err := k8sapi.CreateKubeClient()
if err != nil {
Expand All @@ -98,7 +112,7 @@ func main() {
var cw publisher.Publisher

if options.submitCW {
cw, err = publisher.New(ctx)
cw, err = publisher.New(ctx, region, clusterID)
if err != nil {
log.Fatalf("Failed to create publisher: %v", err)
}
Expand Down
10 changes: 5 additions & 5 deletions cmd/cni-metrics-helper/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,11 @@ func postProcessingHistogram(convert metricsConvert, log logger.Logger) bool {
func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.Logger) (bool, error) {
resetDetected := false

mType := family.GetType()
metricType := family.GetType()
for _, metric := range family.GetMetric() {
for _, act := range convert.actions {
if act.matchFunc(metric) {
switch mType {
switch metricType {
case dto.MetricType_GAUGE:
processGauge(metric, &act)
case dto.MetricType_HISTOGRAM:
Expand All @@ -256,7 +256,7 @@ func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.
}
}

switch mType {
switch metricType {
case dto.MetricType_COUNTER:
curResetDetected := postProcessingCounter(convert, log)
if curResetDetected {
Expand Down Expand Up @@ -316,9 +316,9 @@ func filterMetrics(originalMetrics map[string]*dto.MetricFamily,
func produceCloudWatchMetrics(t metricsTarget, families map[string]*dto.MetricFamily, convertDef map[string]metricsConvert, cw publisher.Publisher) {
for key, family := range families {
convertMetrics := convertDef[key]
mType := family.GetType()
metricType := family.GetType()
for _, action := range convertMetrics.actions {
switch mType {
switch metricType {
case dto.MetricType_COUNTER:
if t.submitCloudWatch() {
dataPoint := &cloudwatch.MetricDatum{
Expand Down
Loading

0 comments on commit f2f7709

Please sign in to comment.