Skip to content

Commit

Permalink
update CNI metrics helper README (#2231)
Browse files Browse the repository at this point in the history
  • Loading branch information
jdn5126 authored Jan 30, 2023
1 parent e8a4481 commit 8fa8356
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 67 deletions.
122 changes: 55 additions & 67 deletions cmd/cni-metrics-helper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,35 @@ publish metrics:
```
"cloudwatch:PutMetricData"
```

By default ipamd will publish prometheus metrics on `:61678/metrics`.
By default, IPAM will publish prometheus metrics on `:61678/metrics`.

The following diagram shows how `cni-metrics-helper` works in a cluster:

![](../../docs/images/cni-metrics-helper.png)

As you can see in the diagram, the `cni-metrics-helper` connects to the API Server over https (`tcp/443`), and another connection is created from the API Server to the worker node over http (tcp/61678). If you deploy Amazon EKS with recommended [Restricting cluster traffic](https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html#security-group-restricting-cluster-traffic), then make sure that a security group is in place that allows the inbound connection from the API Server to the worker nodes over `tcp/61678`.
As you can see in the diagram, the `cni-metrics-helper` connects to the API Server over https (`tcp/443`), and another connection is created from the API Server to the worker node over http (`tcp/61678`). If you deploy Amazon EKS with the recommended security groups from [Restricting cluster traffic](https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html#security-group-restricting-cluster-traffic), then make sure that a security group is in place that allows the inbound connection from the API Server to the worker nodes over `tcp/61678`.

### Using IRSA
Adding the CNI metrics helper will publish the following metrics to CloudWatch:
```
"addReqCount",
"assignIPAddresses",
"awsAPIErr",
"awsAPILatency",
"awsUtilErr",
"delReqCount",
"eniAllocated",
"eniMaxAvailable",
"ipamdActionInProgress",
"ipamdErr",
"maxIPAddresses",
"podENIErr",
"reconcileCount",
"totalIPAddresses",
"totalIPv4Prefixes",
"totalAssignedIPv4sPerCidr"
```

## Using IRSA
As per [AWS EKS Security Best Practice](https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html), if you are using IRSA for pods then following requirements must be satisfied to succesfully publish metrics to CloudWatch

1. The IAM Role for your SA [(IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) must have following policy attached
Expand All @@ -35,47 +54,15 @@ As per [AWS EKS Security Best Practice](https://docs.aws.amazon.com/eks/latest/u
}
```

2. You should have similar ClusterRole and ClusterRoleBinding for the IRSA
2. Specify the IRSA name in the cni-metrics-helper deployment spec alongwith the AWS_CLUSTER_ID (as described below). The value that you specify here will show up under the dimension 'CLUSTER_ID' for your published metrics. Specifying a value for this field is mandatory only if you are blocking IMDS access.

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cni-metrics-helper
rules:
- apiGroups: [""]
resources:
- pods
- pods/proxy
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cni-metrics-helper
labels:
app.kubernetes.io/name: cni-metrics-helper
app.kubernetes.io/instance: cni-metrics-helper
app.kubernetes.io/version: "v1.10.2"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cni-metrics-helper
subjects:
- kind: ServiceAccount
name: <IRSA name>
namespace: kube-system
```

3. Specify the IRSA name in the cni-metrics-helper deployment spec alongwith the AWS_CLUSTER_ID (as described below). The value that you specify here will show up under the dimension 'CLUSTER_ID' for your published metrics. Specifying value for this field is mandatory only if you are blocking IMDS access

#### `AWS_CLUSTER_ID`
### `AWS_CLUSTER_ID`

Type: String

Default: `""`

An Identifier for your Cluster which will be used as the dimension for published metrics. Ideally it should be ClusterName or ClusterID.
An identifier for your Cluster which will be used as the dimension for published metrics. Ideally it should be ClusterName or ClusterID.

```
kind: Deployment
Expand All @@ -96,47 +83,47 @@ spec:
spec:
containers:
- env:
- name: USE_CLOUDWATCH
value: "true"
- name: AWS_CLUSTER_ID
value: ""
- name: USE_CLOUDWATCH
value: "true"
name: cni-metrics-helper
image: <image>
serviceAccountName: <IRSA name>
```
With IRSA, the above deployment spec will be auto-injected with AWS_REGION parameter and it will be used to fetch Region information when we publish metrics.
Possible Scenarios for above configuration
With IRSA, the above deployment spec will be auto-injected with AWS_REGION parameter, and it will be used to fetch region information when we publish metrics.
Possible scenarios for above configuration:
1. If you are not using IRSA, then Region and CLUSTER_ID information will be fetched using IMDS (should have access)
2. If you are using IRSA but have not specified AWS_CLUSTER_ID, we will fetch the value for CLUSTER_ID if IMDS access is not blocked
3. If you have blocked IMDS access, then you must specify a value for AWS_CLUSTER_ID in the deployment spec
4. If you have not blocked IMDS access but have specified AWS_CLUSTER_ID value, then this value will be used.

### Installing the cni-metrics-helper
```
kubectl apply -f v1.6/cni-metrics-helper.yaml
```
## Installing the cni-metrics-helper

Adding the CNI metrics helper will publish the following metrics to CloudWatch:
```
"addReqCount",
"assignIPAddresses",
"awsAPIErr",
"awsAPILatency",
"awsUtilErr",
"delReqCount",
"eniAllocated",
"eniMaxAvailable",
"ipamdActionInProgress",
"ipamdErr",
"maxIPAddresses",
"podENIErr",
"reconcileCount",
"totalIPAddresses",
"totalIPv4Prefixes",
"totalAssignedIPv4sPerCidr"
```
To install the CNI metrics helper, follow the installation instructions from the target version [release notes](https://github.com/aws/amazon-vpc-cni-k8s/releases).

## Creating a metrics dashboard

After you have deployed the CNI metrics helper, you can view the CNI metrics in the Amazon CloudWatch console.

**To create a CNI metrics dashboard**

### Get cni-metrics-helper logs
1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).
2. In the left navigation pane, choose **Metrics** and then select **All metrics**.
3. Choose the **Graphed metrics** tab.
4. Choose **Add metrics using browse or query**.
5. Make sure that under **Metrics**, you've selected the AWS Region for your cluster.
6. In the Search box, enter **Kubernetes** and then press **Enter**.
7. Select the metrics that you want to add to the dashboard.
8. At the upper right of the console, select **Actions**, and then **Add to dashboard**.
9. In the **Select a dashboard** section, choose **Create new**, enter a name for your dashboard, such as **EKS\-CNI\-metrics**, and then choose **Create**.
10. In the **Widget type** section, select **Number**.
11. In the **Customize widget title** section, enter a logical name for your dashboard title, such as **EKS CNI metrics**.
12. Choose **Add to dashboard** to finish. Now your CNI metrics are added to a dashboard that you can monitor. For more information about Amazon CloudWatch Logs metrics, see [Using Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) in the Amazon CloudWatch User Guide.

![EKS CNI metrics](../../docs/images/EKS_CNI_metrics.png)

## Get cni-metrics-helper logs

```
kubectl get pod -n kube-system
Expand All @@ -151,6 +138,7 @@ kube-dns-75fddcb66f-48tzn 3/3 Running 0 1d
```
kubectl logs cni-metrics-helper-6dcff5ddf4-v5l6d -n kube-system
```

### cni-metrics-helper key log messages

Example of some aggregated metrics
Expand Down
Binary file added docs/images/EKS_CNI_metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8fa8356

Please sign in to comment.