Generate calico artifacts from helm #1541

jayanthvn · 2021-07-22T18:23:58Z

What type of PR is this?
Enhancement

Which issue does this PR fix:
#1517

What does this PR do / Why do we need it:
Ported the required manifests from https://github.com/projectcalico/calico/tree/master/_includes/charts. Going forward we won't have config/ folder instead, these artifacts will be built from a script and attached as release artifacts. The helm version we are using doesn't support --include-crds and upgrading the helm version would need additional fixes in the other charts folder. Hence moved the crds in template folder for now. Will restructure as a fast follow up post 1.9 release.

/cc @caseydavenport

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
n/a

Testing done on this change:

Yes

kubectl apply -f /local/home/varavaj/src/doc_update/amazon-vpc-cni-k8s/scripts/../build/cni-rel-yamls/v1.6.4-rc1-166-gd4e8b9aa-dirty/calico-operator.yaml
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/imagesets.operator.tigera.io created
customresourcedefinition.apiextensions.k8s.io/installations.operator.tigera.io created
customresourcedefinition.apiextensions.k8s.io/tigerastatuses.operator.tigera.io created
namespace/tigera-operator created
podsecuritypolicy.policy/tigera-operator created
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
clusterrole.rbac.authorization.k8s.io/tigera-operator created
serviceaccount/tigera-operator created
deployment.apps/tigera-operator created

dev-dsk-varavaj-2b-72f02457 % kgp -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE    IP               NODE                                           NOMINATED NODE   READINESS GATES
kube-system            aws-node-brq8r                              1/1     Running   0          36h    192.168.71.140   ip-192-168-71-140.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-ccvdv                              1/1     Running   0          36h    192.168.8.85     ip-192-168-8-85.us-west-2.compute.internal     <none>           <none>
kube-system            aws-node-ddnmx                              1/1     Running   0          36h    192.168.49.221   ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-lzlbc                              1/1     Running   0          36h    192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>
kube-system            aws-node-pczvd                              1/1     Running   0          36h    192.168.36.204   ip-192-168-36-204.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-s6kz4                              1/1     Running   0          36h    192.168.90.141   ip-192-168-90-141.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-x7rc7                              1/1     Running   0          36h    192.168.50.223   ip-192-168-50-223.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-z9xqh                              1/1     Running   0          36h    192.168.74.218   ip-192-168-74-218.us-west-2.compute.internal   <none>           <none>
kube-system            cni-metrics-helper-6797dd4f5b-dfzgt         1/1     Running   0          9d     192.168.47.0     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            coredns-5946c5d67c-lnwfr                    1/1     Running   0          9d     192.168.47.5     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            coredns-5946c5d67c-mt7j5                    1/1     Running   0          9d     192.168.47.4     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-fsxpn                            1/1     Running   0          2d8h   192.168.50.223   ip-192-168-50-223.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-fvbfr                            1/1     Running   0          2d8h   192.168.71.140   ip-192-168-71-140.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-hwwz5                            1/1     Running   0          9d     192.168.74.218   ip-192-168-74-218.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-l9stq                            1/1     Running   0          2d8h   192.168.8.85     ip-192-168-8-85.us-west-2.compute.internal     <none>           <none>
kube-system            kube-proxy-m2flh                            1/1     Running   0          2d8h   192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>
kube-system            kube-proxy-ns9wm                            1/1     Running   0          2d8h   192.168.90.141   ip-192-168-90-141.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-rthwh                            1/1     Running   0          2d8h   192.168.36.204   ip-192-168-36-204.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-scvk7                            1/1     Running   0          9d     192.168.49.221   ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            metrics-server-77946ff588-hcmb7             1/1     Running   0          9d     192.168.47.1     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-894c58c65-8zm7k   1/1     Running   0          9d     192.168.47.2     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-5756647dbb-lkx74       1/1     Running   0          9d     192.168.47.3     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
tigera-operator        tigera-operator-6db99fb878-brq4h            1/1     Running   0          6s     192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>

dev-dsk-varavaj-2b-72f02457 % kubectl apply -f /local/home/varavaj/src/doc_update/amazon-vpc-cni-k8s/scripts/../build/cni-rel-yamls/v1.6.4-rc1-166-gd4e8b9aa-dirty/calico-crs.yaml
installation.operator.tigera.io/default created

kubectl get daemonset calico-node --namespace calico-system
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
calico-node   8         8         8       8            8           kubernetes.io/os=linux   32s

Automation added to e2e:

yes

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tmjd · 2021-07-23T19:12:11Z

You're not removing config/master in this PR yet, it would be nice if you could place the generated calico-operator.yaml and calico-crs.yaml in config/master and that would be an easy way to verify that they are being generated correctly. Or maybe you have done that and they are the same so there is no diff.

jayanthvn · 2021-07-23T19:31:44Z

You're not removing config/master in this PR yet, it would be nice if you could place the generated calico-operator.yaml and calico-crs.yaml in config/master and that would be an easy way to verify that they are being generated correctly. Or maybe you have done that and they are the same so there is no diff.

Yeah, I haven't copied it to master/config. That makes sense, let me do that and will be easy to verify.

jayanthvn · 2021-08-05T18:11:36Z

@tmjd - I copied the generated calico-operator.yaml and calico-crs.yaml in config/master and I do see there are some differences. I just ported the required manifests from https://github.com/projectcalico/calico/tree/master/_includes/charts. Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

tmjd · 2021-08-05T18:36:04Z

The differences in the generated manifests are showing because you based your updates on the master branch but if the idea was to match what is currently in the calico-operator.yaml, the changes should have been based on the release-v3.17 branch.
I'd suggest basing the changes on one of the release-* branches in projectcalico/calico so you're not picking up any WIP changes from master.

Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

I think keeping them in sync should work but that sync should be done on release branches and also the charts/aws-calico/values.yaml will need to also be updated to keep the tigeraOperator.tag field in sync with the release.

jayanthvn · 2021-08-05T18:40:03Z

The differences in the generated manifests are showing because you based your updates on the master branch but if the idea was to match what is currently in the calico-operator.yaml, the changes should have been based on the release-v3.17 branch.
I'd suggest basing the changes on one of the release-* branches in projectcalico/calico so you're not picking up any WIP changes from master.

Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

I think keeping them in sync should work but that sync should be done on release branches and also the charts/aws-calico/values.yaml will need to also be updated to keep the tigeraOperator.tag field in sync with the release.

Agreed, I will update it to match the release-v3.17 branch

tmjd

I think the changes look good as far as Calico and the generated manifests for Calico.

jayanthvn · 2021-08-05T19:43:05Z

Thanks @tmjd

stevehipwell · 2021-08-06T21:14:56Z

I might be mistaken, but it looks like you've taken a Calico version that isn't compatible with the aws-vpc-cni. Basically any version of Calico from v3.16.0 to v3.19.x is incompatible with any non-Calico CNI due to a known issue blocking all network traffic after pod termination. The latest v3.20.0 (Tigera operator v1.20.0) seems to have resolved the issue (see eks-charts for more details as they've made this very mistake).

jayanthvn · 2021-08-06T21:18:22Z

@tmjd can you please comment on this?

stevehipwell · 2021-08-06T21:19:48Z

aws/eks-charts#565

tmjd · 2021-08-06T21:58:17Z

@stevehipwell I'm not sure what you mean by not compatible with aws-vpc-cni. I believe it is compatible and has been functioning with it, but yes there is a known bug with the version.
This PR changed the operator version from v1.13.2 to v1.13.8 which corresponds with going from Calico v3.17.1 to v3.17.4.
I believe the issue being referring to is projectcalico/calico#4518, which is an issue before this PR and after the PR, so the presence of the bug has not changed with this PR.

I think it would be good for this project to upgrade to Calico v3.20 to pick up the fix for the bug but I think it is an exaggeration to say this PR is pulling in an incompatible version. Please if there is something else you're referring to or I'm misunderstanding something please let me know and provide a link to the Issue.

If it is not possible to update to Calico v3.20, the fix for the termination bug has been backported to the v3.19, v3.18, and v3.17 branches though I'm not sure when those releases will happen.

I'm hoping with the changes that @jayanthvn has just merged that it will be easier to keep up with Calico updates and it won't be much trouble to update to v3.20.

aquam8 · 2021-08-06T22:03:39Z

Thanks for raising this @stevehipwell ! This is indeed a problem and only Calico 3.20 has the fix for EKS CNI compatibility. Helm charts may well be unaffected with only the appVersion changing.. to be verified.
It's good to see progress on the calico helm Chart in this repo tho, thanks!

tmjd · 2021-08-06T22:17:22Z

@stevehipwell I think I see what you're talking about. You're talking about the helm chart that is generated with this configuration. I didn't realize somewhere the helm chart this was updating was being used, I thought the purpose was just to be able to generate the manifest that is used in the AWS documentation which is the calico-operator.yaml.

So you're correct that the helm chart does now include the bug that I referenced.

jayanthvn · 2021-08-06T22:23:01Z

We can upgrade it to v3.20.0. Happy to take PRs :)

stevehipwell · 2021-08-07T04:38:08Z

The TL;DR version is that using any version of Calico after v3.15.x and before v3.20.0 with the aws-vpc-cni will introduce a serious defect with pod termination. This is especially problematic with out of tree CSI drivers where any pods with PVCs will have to wait a significant time to restart after being on a terminated node.

The aws-calico chart in eks-charts was updated to a version with this issue and caused us a significant amount of work to figure out what was wrong (there is still no response on that issue). I get that this project already had a broken version of Calico, but as the definitive way to install Calico in EKS replacing the aws-calico chart this should either be rolled back to a working v3.15.x version or forward to v3.20.0 and not left broken.

amitkatyal · 2021-08-22T02:44:56Z

@jayanthvn, @haouc, Whenever we are updating the Calico, we will be updating the manifest file as well right?
As we are deploying the calico using the manifest file via CF template, it would be easy for us to update the existing cluster.

* Generate calico manifests from helm * Update CNI charts image version * Testing release workflow<Do-not-merge> * Testing workflow * Updating workflow * set matchlabels while generating manifests * Copy generated manifests to master/config folder * update to match the release branch * update operator to v1.13.8 * update version

Update documentation for PD (aws#1539) Update SDK Go version (aws#1542) * Update SDK Go version * missed mod file Bump helm.sh/helm/v3 from 3.2.0 to 3.6.1 in /test (aws#1545) Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.2.0 to 3.6.1. - [Release notes](https://github.com/helm/helm/releases) - [Commits](helm/helm@v3.2.0...v3.6.1) --- updated-dependencies: - dependency-name: helm.sh/helm/v3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Update CNI charts image version (aws#1543) * Update CNI charts image version * updated the labels * CRD update * update charts version * minor nits * fix charts ver * fix charts ver Cherry-pick to master - (aws#1551) (aws#1552) * Release - v1.9.0 (aws#1551) * update markdown Update prefix-and-ip-target.md (aws#1553) Grammar fixes Use codecov github action (aws#1550) In particular, this avoids `curl | bash` (the bash script is simply embedded in the github action image). It also provides a trivial upgrade path to the new nodejs-based uploader (`@v1` -> `@v2`), when we decide that is ready (perhaps now). Added Multus artifacts to config folder (aws#1563) Added Readme for Multus Installation Updated configMap in daemonset to use aws-vpc-cni as default delegate instead of flannel Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com> set requests/limits for initcontainer (aws#1559) Documentation update (aws#1565) * Doc update * more updates set multus log level to panic instead of debug (aws#1567) Updated Readme for Multus logging info Changed log level to error instead of debug Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com> WARM targets can be set non-negative (aws#1568) Generate calico artifacts from helm (aws#1541) * Generate calico manifests from helm * Update CNI charts image version * Testing release workflow<Do-not-merge> * Testing workflow * Updating workflow * set matchlabels while generating manifests * Copy generated manifests to master/config folder * update to match the release branch * update operator to v1.13.8 * update version Change github_token permission (aws#1577) Bandwidth plugin support (aws#1560) * bandwidth plugin support * update netns Update new instance types (aws#1576) Updated multus ds manifest file for v3.7.2-eksbuild.2 (aws#1583) * Updated multus ds manifest file for v3.7.2-eksbuild.2 Remove Node Affinity for amd64 from manifest so that it can run on arm64 as well * revert log-level to error for multus Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com> Modify integ test workflow (aws#1579) * Change github_token permission * - Modified permissions for github_token in cron and integ test workflow - Modified integ test workflow to run on push to master and release branches Upgrading controller-runtime is test dir (aws#1582) Update Ginkgo command params example in the doc (aws#1589) Update CONTRIBUTING.md (aws#1591) Fix region/account for manifests generated from helm (aws#1592) * Yamls generated from helm was missing region/account override * fix domains Updated snat rule test logic Install iptables in the test agent image

jayanthvn force-pushed the manifests branch from 5430979 to 0c21acb Compare July 22, 2021 18:24

jayanthvn requested review from fawadkhaliq and achevuru July 22, 2021 18:25

jayanthvn added this to the v1.9.0 milestone Jul 22, 2021

jayanthvn requested a review from M00nF1sh July 22, 2021 22:04

jayanthvn force-pushed the manifests branch from 0c21acb to 7823d54 Compare July 23, 2021 17:07

jayanthvn mentioned this pull request Jul 23, 2021

Update CNI charts image version #1543

Merged

jayanthvn removed this from the v1.9.0 milestone Jul 23, 2021

jayanthvn added 7 commits August 5, 2021 18:14

Generate calico manifests from helm

e03f11f

Update CNI charts image version

75d1fed

Testing release workflow<Do-not-merge>

9242ea3

Testing workflow

ea0d582

Updating workflow

1ca70ca

set matchlabels while generating manifests

6ee73da

Copy generated manifests to master/config folder

3fec890

jayanthvn force-pushed the manifests branch from 8e98609 to 3fec890 Compare August 5, 2021 18:18

jayanthvn added 3 commits August 5, 2021 19:11

update to match the release branch

f3cfcbf

update operator to v1.13.8

eae1b8f

update version

4778dc3

tmjd reviewed Aug 5, 2021

View reviewed changes

achevuru approved these changes Aug 5, 2021

View reviewed changes

jayanthvn merged commit 6361ced into aws:master Aug 5, 2021

jayanthvn deleted the manifests branch August 5, 2021 22:49

jayanthvn mentioned this pull request Aug 6, 2021

Discrepancy in Calico Manifest files vs Helm chart #1517

Closed

stevehipwell mentioned this pull request Aug 9, 2021

With non-Calico CNI, Calico networkpolicy enforcement does not allow Terminating pods to gracefully shut down projectcalico/calico#4518

Closed

haouc mentioned this pull request Aug 18, 2021

Update Calico operator to 1.21, updates CRDs, and roles #1578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate calico artifacts from helm #1541

Generate calico artifacts from helm #1541

jayanthvn commented Jul 22, 2021 •

edited

Loading

tmjd commented Jul 23, 2021

jayanthvn commented Jul 23, 2021

jayanthvn commented Aug 5, 2021

tmjd commented Aug 5, 2021

jayanthvn commented Aug 5, 2021

tmjd left a comment

jayanthvn commented Aug 5, 2021

stevehipwell commented Aug 6, 2021

jayanthvn commented Aug 6, 2021

stevehipwell commented Aug 6, 2021

tmjd commented Aug 6, 2021

aquam8 commented Aug 6, 2021

tmjd commented Aug 6, 2021

jayanthvn commented Aug 6, 2021

stevehipwell commented Aug 7, 2021

amitkatyal commented Aug 22, 2021

Generate calico artifacts from helm #1541

Generate calico artifacts from helm #1541

Conversation

jayanthvn commented Jul 22, 2021 • edited Loading

tmjd commented Jul 23, 2021

jayanthvn commented Jul 23, 2021

jayanthvn commented Aug 5, 2021

tmjd commented Aug 5, 2021

jayanthvn commented Aug 5, 2021

tmjd left a comment

Choose a reason for hiding this comment

jayanthvn commented Aug 5, 2021

stevehipwell commented Aug 6, 2021

jayanthvn commented Aug 6, 2021

stevehipwell commented Aug 6, 2021

tmjd commented Aug 6, 2021

aquam8 commented Aug 6, 2021

tmjd commented Aug 6, 2021

jayanthvn commented Aug 6, 2021

stevehipwell commented Aug 7, 2021

amitkatyal commented Aug 22, 2021

jayanthvn commented Jul 22, 2021 •

edited

Loading