Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes node label error handling #1892

Merged
merged 15 commits into from
Mar 8, 2022
Merged

Conversation

jayanthvn
Copy link
Contributor

@jayanthvn jayanthvn commented Feb 28, 2022

What type of PR is this?
bug

Which issue does this PR fix:
Fixes #1887

What does this PR do / Why do we need it:
Added error handling and node patch instead of update.

Using MergeFrom instead of StrategicMergeFrom, @M00nF1sh any preference here?

/ The difference between MergeFrom and StrategicMergeFrom lays in the handling of modified list fields.
// When using MergeFrom, existing lists will be completely replaced by new lists.
// When using StrategicMergeFrom, the list field's `patchStrategy` is respected if specified in the API type,

Ref: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/patch.go#L164

This will need patch permission for nodes. Will evaluate if we can remove update from nodes.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
#1887

Testing done on this change:

Custom networking -

[varavaj@dev-dsk-varavaj-2b-3c80898a amazon-vpc-cni-k8s]$ kubectl get pods -n kube-system -owide
NAME                       READY   STATUS    RESTARTS   AGE     IP              NODE                                          NOMINATED NODE   READINESS GATES
aws-node-524nl             1/1     Running   0          4h30m   192.168.76.72   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
coredns-6548845887-knt4k   1/1     Running   0          4h32m   100.64.91.181   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
coredns-6548845887-svnp6   1/1     Running   0          4h32m   100.64.95.229   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
kube-proxy-vnftj           1/1     Running   0          4h30m   192.168.76.72   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>

[varavaj@dev-dsk-varavaj-2b-3c80898a amazon-vpc-cni-k8s]$ kubectl get pods -owide
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE                                          NOMINATED NODE   READINESS GATES
nginx-deployment-66b6c48dd5-8skfv   1/1     Running   0          4h32m   100.64.90.119   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
nginx-deployment-66b6c48dd5-nn7d7   1/1     Running   0          4h32m   100.64.84.166   ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
nginx-deployment-66b6c48dd5-wk8lk   1/1     Running   0          4h32m   100.64.93.93    ip-192-168-76-72.us-west-2.compute.internal   <none>           <none>
{"level":"info","ts":"2022-03-04T02:05:16.663Z","caller":"ipamd/ipamd.go:541","msg":"Get Node Info for: ip-192-168-76-72.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-03-04T02:05:16.764Z","caller":"eniconfig/eniconfig.go:136","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"debug","ts":"2022-03-04T02:05:16.764Z","caller":"retry/util.go:51","msg":"Node found \"ip-192-168-76-72.us-west-2.compute.internal\" - labels - '\\x12'"}
{"level":"debug","ts":"2022-03-04T02:05:16.779Z","caller":"retry/util.go:51","msg":"Updated node ip-192-168-76-72.us-west-2.compute.internal with label \"vpc.amazonaws.com/eniConfig\": \"us-west-2c\""}
{"level":"info","ts":"2022-03-04T02:05:19.281Z","caller":"eniconfig/eniconfig.go:73","msg":"Get Node Info for: ip-192-168-76-72.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-03-04T02:05:19.281Z","caller":"eniconfig/eniconfig.go:136","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"info","ts":"2022-03-04T02:05:19.281Z","caller":"ipamd/ipamd.go:837","msg":"Found ENI Config Name: us-west-2c"}
{"level":"info","ts":"2022-03-04T02:05:19.382Z","caller":"ipamd/ipamd.go:811","msg":"ipamd: using custom network config: [], subnet-06cac0e2cf3843ec7"}
{"level":"info","ts":"2022-03-04T02:05:19.382Z","caller":"awsutils/awsutils.go:730","msg":"Using a custom network config for the new ENI"}

Fixed the below log in next commit -

{"level":"warn","ts":"2022-03-04T02:05:19.382Z","caller":"awsutils/awsutils.go:730","msg":"No custom networking security group found, will use the node's primary ENI's SG: [%!s(*string=0xc000078400) %!s(*string=0xc000078410)]"}

API server calls -

Screen Shot 2022-03-03 at 11 38 33 PM

At 07:26:14.789Z upgraded from 1.7.10 to image with fix. API server calls are in sync with what we tested in 1.8.0

{"level":"debug","ts":"2022-03-04T07:26:07.036Z","caller":"sdk/informer-sync.go:88","msg":"Handle corev1.Node: ip-192-168-76-72.us-west-2.compute.internal, map[node.alpha.kubernetes.io/ttl:0 volumes.kubernetes.io/controller-managed-attach-detach:true], map[alpha.eksctl.io/cluster-name:dev-cluster alpha.eksctl.io/nodegroup-name:nodegroup beta.kubernetes.io/arch:amd64 beta.kubernetes.io/instance-type:t3.small beta.kubernetes.io/os:linux eks.amazonaws.com/capacityType:ON_DEMAND eks.amazonaws.com/nodegroup:nodegroup eks.amazonaws.com/nodegroup-image:ami-02861c26f5cf11d24 eks.amazonaws.com/sourceLaunchTemplateId:lt-0dd642d5c82ff5a73 eks.amazonaws.com/sourceLaunchTemplateVersion:1 failure-domain.beta.kubernetes.io/region:us-west-2 failure-domain.beta.kubernetes.io/zone:us-west-2c kubernetes.io/arch:amd64 kubernetes.io/hostname:ip-192-168-76-72.us-west-2.compute.internal kubernetes.io/os:linux node.kubernetes.io/instance-type:t3.small topology.kubernetes.io/region:us-west-2 topology.kubernetes.io/zone:us-west-2c vpc.amazonaws.com/eniConfig:us-west-2c]"}
{"level":"debug","ts":"2022-03-04T07:26:07.044Z","caller":"sdk/informer-sync.go:88","msg":"Handle ENIConfig Add/Update: us-west-2a, [], subnet-0eeb2ea845de1c8d3"}
{"level":"debug","ts":"2022-03-04T07:26:07.044Z","caller":"sdk/informer-sync.go:88","msg":"Handle ENIConfig Add/Update: us-west-2b, [], subnet-0747217e1d86e6510"}
{"level":"debug","ts":"2022-03-04T07:26:07.044Z","caller":"sdk/informer-sync.go:88","msg":"Handle ENIConfig Add/Update: us-west-2c, [], subnet-06cac0e2cf3843ec7"}
{"level":"info","ts":"2022-03-04T07:26:14.789Z","caller":"aws-k8s-agent/main.go:27","msg":"Starting L-IPAMD   ..."}

Screen Shot 2022-03-03 at 11 46 33 PM

SGPP -

[varavaj@dev-dsk-varavaj-2b-3c80898a amazon-vpc-cni-k8s]$ kubectl get nodes -oyaml | grep 'vpc.amazonaws.com/has-trunk-attached: "'
      vpc.amazonaws.com/has-trunk-attached: "true"
      vpc.amazonaws.com/has-trunk-attached: "true"

[varavaj@dev-dsk-varavaj-2b-3c80898a amazon-vpc-cni-k8s]$ kubectl get pods -owide
NAME                              READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
branch-eni-app-854559894b-496sx   1/1     Running   0          19m   192.168.78.198   ip-192-168-89-129.us-west-2.compute.internal   <none>           <none>
branch-eni-app-854559894b-6t8cm   1/1     Running   0          19m   192.168.90.233   ip-192-168-89-129.us-west-2.compute.internal   <none>           <none>
branch-eni-app-854559894b-7tdfk   1/1     Running   0          19m   192.168.69.42    ip-192-168-89-129.us-west-2.compute.internal   <none>           <none>
branch-eni-app-854559894b-lnzsx   1/1     Running   0          19m   192.168.89.77    ip-192-168-89-129.us-west-2.compute.internal   <none>           <none>
branch-eni-app-854559894b-mm5hr   1/1     Running   0          19m   192.168.64.154   ip-192-168-89-129.us-west-2.compute.internal   <none>           <none>
kubectl describe pod branch-eni-app-854559894b-496sx | grep -B1 eniId
              vpc.amazonaws.com/pod-eni:
                [{"eniId":"eni-0bbb02b3cbdfd5bf7","ifAddress":"0a:e5:6f:88:d3:97","privateIp":"192.168.78.198","vlanId":4,"subnetCidr":"192.168.64.0/19"}]

Automation added to e2e:

No

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

Yes, need to add patch permission for nodes.

Does this PR introduce any user-facing change?:

No

Patch permissions needs to be added with latest controller runtime, update is treated as patch.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jayanthvn jayanthvn requested a review from M00nF1sh February 28, 2022 21:10
@jayanthvn jayanthvn requested a review from a team as a code owner February 28, 2022 21:10
Copy link
Contributor

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer strategicMergePatch over mergePatch given it saves payload and we don't need conflict check for this specific usecase.

@jayanthvn
Copy link
Contributor Author

Sure sounds good. I am updating the PR.

@jayanthvn
Copy link
Contributor Author

i prefer strategicMergePatch over mergePatch given it saves payload and we don't need conflict check for this specific usecase.

Updated with strategicMerge.

@jayanthvn jayanthvn requested a review from achevuru March 3, 2022 23:02
@jayanthvn
Copy link
Contributor Author

jayanthvn commented Mar 4, 2022

Reviewers - I'm looking into UT cases.

@jayanthvn
Copy link
Contributor Author

Reviewers - I'm looking into UT cases.

UTs updated.

pkg/ipamd/ipamd.go Outdated Show resolved Hide resolved
Copy link
Contributor

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix error handling in node label update path
2 participants