Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate calico artifacts from helm #1541

Merged
merged 10 commits into from
Aug 5, 2021
Merged

Conversation

jayanthvn
Copy link
Contributor

@jayanthvn jayanthvn commented Jul 22, 2021

What type of PR is this?
Enhancement

Which issue does this PR fix:
#1517

What does this PR do / Why do we need it:
Ported the required manifests from https://github.com/projectcalico/calico/tree/master/_includes/charts. Going forward we won't have config/ folder instead, these artifacts will be built from a script and attached as release artifacts. The helm version we are using doesn't support --include-crds and upgrading the helm version would need additional fixes in the other charts folder. Hence moved the crds in template folder for now. Will restructure as a fast follow up post 1.9 release.

/cc @caseydavenport

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
n/a

Testing done on this change:

Yes

kubectl apply -f /local/home/varavaj/src/doc_update/amazon-vpc-cni-k8s/scripts/../build/cni-rel-yamls/v1.6.4-rc1-166-gd4e8b9aa-dirty/calico-operator.yaml
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/imagesets.operator.tigera.io created
customresourcedefinition.apiextensions.k8s.io/installations.operator.tigera.io created
customresourcedefinition.apiextensions.k8s.io/tigerastatuses.operator.tigera.io created
namespace/tigera-operator created
podsecuritypolicy.policy/tigera-operator created
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
clusterrole.rbac.authorization.k8s.io/tigera-operator created
serviceaccount/tigera-operator created
deployment.apps/tigera-operator created
dev-dsk-varavaj-2b-72f02457 % kgp -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE    IP               NODE                                           NOMINATED NODE   READINESS GATES
kube-system            aws-node-brq8r                              1/1     Running   0          36h    192.168.71.140   ip-192-168-71-140.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-ccvdv                              1/1     Running   0          36h    192.168.8.85     ip-192-168-8-85.us-west-2.compute.internal     <none>           <none>
kube-system            aws-node-ddnmx                              1/1     Running   0          36h    192.168.49.221   ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-lzlbc                              1/1     Running   0          36h    192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>
kube-system            aws-node-pczvd                              1/1     Running   0          36h    192.168.36.204   ip-192-168-36-204.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-s6kz4                              1/1     Running   0          36h    192.168.90.141   ip-192-168-90-141.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-x7rc7                              1/1     Running   0          36h    192.168.50.223   ip-192-168-50-223.us-west-2.compute.internal   <none>           <none>
kube-system            aws-node-z9xqh                              1/1     Running   0          36h    192.168.74.218   ip-192-168-74-218.us-west-2.compute.internal   <none>           <none>
kube-system            cni-metrics-helper-6797dd4f5b-dfzgt         1/1     Running   0          9d     192.168.47.0     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            coredns-5946c5d67c-lnwfr                    1/1     Running   0          9d     192.168.47.5     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            coredns-5946c5d67c-mt7j5                    1/1     Running   0          9d     192.168.47.4     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-fsxpn                            1/1     Running   0          2d8h   192.168.50.223   ip-192-168-50-223.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-fvbfr                            1/1     Running   0          2d8h   192.168.71.140   ip-192-168-71-140.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-hwwz5                            1/1     Running   0          9d     192.168.74.218   ip-192-168-74-218.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-l9stq                            1/1     Running   0          2d8h   192.168.8.85     ip-192-168-8-85.us-west-2.compute.internal     <none>           <none>
kube-system            kube-proxy-m2flh                            1/1     Running   0          2d8h   192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>
kube-system            kube-proxy-ns9wm                            1/1     Running   0          2d8h   192.168.90.141   ip-192-168-90-141.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-rthwh                            1/1     Running   0          2d8h   192.168.36.204   ip-192-168-36-204.us-west-2.compute.internal   <none>           <none>
kube-system            kube-proxy-scvk7                            1/1     Running   0          9d     192.168.49.221   ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kube-system            metrics-server-77946ff588-hcmb7             1/1     Running   0          9d     192.168.47.1     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-894c58c65-8zm7k   1/1     Running   0          9d     192.168.47.2     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-5756647dbb-lkx74       1/1     Running   0          9d     192.168.47.3     ip-192-168-49-221.us-west-2.compute.internal   <none>           <none>
tigera-operator        tigera-operator-6db99fb878-brq4h            1/1     Running   0          6s     192.168.16.15    ip-192-168-16-15.us-west-2.compute.internal    <none>           <none>
dev-dsk-varavaj-2b-72f02457 % kubectl apply -f /local/home/varavaj/src/doc_update/amazon-vpc-cni-k8s/scripts/../build/cni-rel-yamls/v1.6.4-rc1-166-gd4e8b9aa-dirty/calico-crs.yaml
installation.operator.tigera.io/default created
kubectl get daemonset calico-node --namespace calico-system
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
calico-node   8         8         8       8            8           kubernetes.io/os=linux   32s

Automation added to e2e:

yes

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jayanthvn jayanthvn added this to the v1.9.0 milestone Jul 22, 2021
@jayanthvn jayanthvn requested a review from M00nF1sh July 22, 2021 22:04
@jayanthvn jayanthvn removed this from the v1.9.0 milestone Jul 23, 2021
@tmjd
Copy link
Contributor

tmjd commented Jul 23, 2021

You're not removing config/master in this PR yet, it would be nice if you could place the generated calico-operator.yaml and calico-crs.yaml in config/master and that would be an easy way to verify that they are being generated correctly. Or maybe you have done that and they are the same so there is no diff.

@jayanthvn
Copy link
Contributor Author

You're not removing config/master in this PR yet, it would be nice if you could place the generated calico-operator.yaml and calico-crs.yaml in config/master and that would be an easy way to verify that they are being generated correctly. Or maybe you have done that and they are the same so there is no diff.

Yeah, I haven't copied it to master/config. That makes sense, let me do that and will be easy to verify.

@jayanthvn
Copy link
Contributor Author

@tmjd - I copied the generated calico-operator.yaml and calico-crs.yaml in config/master and I do see there are some differences. I just ported the required manifests from https://github.com/projectcalico/calico/tree/master/_includes/charts. Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

@tmjd
Copy link
Contributor

tmjd commented Aug 5, 2021

The differences in the generated manifests are showing because you based your updates on the master branch but if the idea was to match what is currently in the calico-operator.yaml, the changes should have been based on the release-v3.17 branch.
I'd suggest basing the changes on one of the release-* branches in projectcalico/calico so you're not picking up any WIP changes from master.

Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

I think keeping them in sync should work but that sync should be done on release branches and also the charts/aws-calico/values.yaml will need to also be updated to keep the tigeraOperator.tag field in sync with the release.

@jayanthvn
Copy link
Contributor Author

The differences in the generated manifests are showing because you based your updates on the master branch but if the idea was to match what is currently in the calico-operator.yaml, the changes should have been based on the release-v3.17 branch.
I'd suggest basing the changes on one of the release-* branches in projectcalico/calico so you're not picking up any WIP changes from master.

Can you please check if it fine to just keep the charts folder in sync between vpc cni and calico?

I think keeping them in sync should work but that sync should be done on release branches and also the charts/aws-calico/values.yaml will need to also be updated to keep the tigeraOperator.tag field in sync with the release.

Agreed, I will update it to match the release-v3.17 branch

Copy link
Contributor

@tmjd tmjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good as far as Calico and the generated manifests for Calico.

@jayanthvn
Copy link
Contributor Author

Thanks @tmjd

@jayanthvn jayanthvn merged commit 6361ced into aws:master Aug 5, 2021
@jayanthvn jayanthvn deleted the manifests branch August 5, 2021 22:49
@stevehipwell
Copy link
Contributor

I might be mistaken, but it looks like you've taken a Calico version that isn't compatible with the aws-vpc-cni. Basically any version of Calico from v3.16.0 to v3.19.x is incompatible with any non-Calico CNI due to a known issue blocking all network traffic after pod termination. The latest v3.20.0 (Tigera operator v1.20.0) seems to have resolved the issue (see eks-charts for more details as they've made this very mistake).

@jayanthvn
Copy link
Contributor Author

@tmjd can you please comment on this?

@stevehipwell
Copy link
Contributor

aws/eks-charts#565

@tmjd
Copy link
Contributor

tmjd commented Aug 6, 2021

@stevehipwell I'm not sure what you mean by not compatible with aws-vpc-cni. I believe it is compatible and has been functioning with it, but yes there is a known bug with the version.
This PR changed the operator version from v1.13.2 to v1.13.8 which corresponds with going from Calico v3.17.1 to v3.17.4.
I believe the issue being referring to is projectcalico/calico#4518, which is an issue before this PR and after the PR, so the presence of the bug has not changed with this PR.

I think it would be good for this project to upgrade to Calico v3.20 to pick up the fix for the bug but I think it is an exaggeration to say this PR is pulling in an incompatible version. Please if there is something else you're referring to or I'm misunderstanding something please let me know and provide a link to the Issue.

If it is not possible to update to Calico v3.20, the fix for the termination bug has been backported to the v3.19, v3.18, and v3.17 branches though I'm not sure when those releases will happen.

I'm hoping with the changes that @jayanthvn has just merged that it will be easier to keep up with Calico updates and it won't be much trouble to update to v3.20.

@aquam8
Copy link

aquam8 commented Aug 6, 2021

Thanks for raising this @stevehipwell ! This is indeed a problem and only Calico 3.20 has the fix for EKS CNI compatibility. Helm charts may well be unaffected with only the appVersion changing.. to be verified.
It's good to see progress on the calico helm Chart in this repo tho, thanks!

@tmjd
Copy link
Contributor

tmjd commented Aug 6, 2021

@stevehipwell I think I see what you're talking about. You're talking about the helm chart that is generated with this configuration. I didn't realize somewhere the helm chart this was updating was being used, I thought the purpose was just to be able to generate the manifest that is used in the AWS documentation which is the calico-operator.yaml.

So you're correct that the helm chart does now include the bug that I referenced.

@jayanthvn
Copy link
Contributor Author

We can upgrade it to v3.20.0. Happy to take PRs :)

@stevehipwell
Copy link
Contributor

The TL;DR version is that using any version of Calico after v3.15.x and before v3.20.0 with the aws-vpc-cni will introduce a serious defect with pod termination. This is especially problematic with out of tree CSI drivers where any pods with PVCs will have to wait a significant time to restart after being on a terminated node.

The aws-calico chart in eks-charts was updated to a version with this issue and caused us a significant amount of work to figure out what was wrong (there is still no response on that issue). I get that this project already had a broken version of Calico, but as the definitive way to install Calico in EKS replacing the aws-calico chart this should either be rolled back to a working v3.15.x version or forward to v3.20.0 and not left broken.

@amitkatyal
Copy link

@jayanthvn, @haouc, Whenever we are updating the Calico, we will be updating the manifest file as well right?
As we are deploying the calico using the manifest file via CF template, it would be easy for us to update the existing cluster.

jayanthvn added a commit to jayanthvn/amazon-vpc-cni-k8s that referenced this pull request Aug 23, 2021
* Generate calico manifests from helm

* Update CNI charts image version

* Testing release workflow<Do-not-merge>

* Testing workflow

* Updating workflow

* set matchlabels while generating manifests

* Copy generated manifests to master/config folder

* update to match the release branch

* update operator to v1.13.8

* update version
haouc pushed a commit to haouc/amazon-vpc-cni-k8s that referenced this pull request Aug 24, 2021
* Generate calico manifests from helm

* Update CNI charts image version

* Testing release workflow<Do-not-merge>

* Testing workflow

* Updating workflow

* set matchlabels while generating manifests

* Copy generated manifests to master/config folder

* update to match the release branch

* update operator to v1.13.8

* update version
cgchinmay pushed a commit to cgchinmay/amazon-vpc-cni-k8s that referenced this pull request Aug 30, 2021
Update documentation for PD (aws#1539)

Update SDK Go version (aws#1542)

* Update SDK Go version

* missed mod file

Bump helm.sh/helm/v3 from 3.2.0 to 3.6.1 in /test (aws#1545)

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.2.0 to 3.6.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.2.0...v3.6.1)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Update CNI charts image version (aws#1543)

* Update CNI charts image version

* updated the labels

* CRD update

* update charts version

* minor nits

* fix charts ver

* fix charts ver

Cherry-pick to master - (aws#1551) (aws#1552)

* Release - v1.9.0 (aws#1551)

* update markdown

Update prefix-and-ip-target.md (aws#1553)

Grammar fixes

Use codecov github action (aws#1550)

In particular, this avoids `curl | bash` (the bash script is simply
embedded in the github action image).

It also provides a trivial upgrade path to the new nodejs-based
uploader (`@v1` -> `@v2`), when we decide that is ready (perhaps now).

Added Multus artifacts to config folder (aws#1563)

Added Readme for Multus Installation

Updated configMap in daemonset to use aws-vpc-cni as default delegate instead of flannel

Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com>

set requests/limits for initcontainer (aws#1559)

Documentation update (aws#1565)

* Doc update

* more updates

set multus log level to panic instead of debug (aws#1567)

Updated Readme for Multus logging info

Changed log level to error instead of debug

Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com>

WARM targets can be set non-negative (aws#1568)

Generate calico artifacts from helm (aws#1541)

* Generate calico manifests from helm

* Update CNI charts image version

* Testing release workflow<Do-not-merge>

* Testing workflow

* Updating workflow

* set matchlabels while generating manifests

* Copy generated manifests to master/config folder

* update to match the release branch

* update operator to v1.13.8

* update version

Change github_token permission (aws#1577)

Bandwidth plugin support (aws#1560)

* bandwidth plugin support

* update netns

Update new instance types (aws#1576)

Updated multus ds manifest file for v3.7.2-eksbuild.2 (aws#1583)

* Updated multus ds manifest file for v3.7.2-eksbuild.2

Remove Node Affinity for amd64 from manifest so that it can run on arm64 as well

* revert log-level to error for multus

Co-authored-by: Chinmay Gadgil <cgadgil@amazon.com>

Modify integ test workflow (aws#1579)

* Change github_token permission

* - Modified permissions for github_token in cron and integ test workflow
- Modified integ test workflow to run on push to master and release branches

Upgrading controller-runtime is test dir (aws#1582)

Update Ginkgo command params example in the doc (aws#1589)

Update CONTRIBUTING.md (aws#1591)

Fix region/account for manifests generated from helm (aws#1592)

* Yamls generated from helm was missing region/account override

* fix domains

Updated snat rule test logic

Install iptables in the test agent image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants