-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing rules when NodePort support is disabled #2026
Add missing rules when NodePort support is disabled #2026
Conversation
I am happy to do more testing, once maintainers validate the approach |
4e888c7
to
8802030
Compare
f0219c0
to
6fe5050
Compare
@antoninbas, thanks for the fix. Could you test the following?
|
@kishorj I did 2 upgrade tests, one with NodePort support enabled, and one with NodePort support disabled. Full iptables rules for nat and mangle tables are below. There is only one issue. Because I removed the unnecessary
I see 3 solutions, let me see which one you want:
Full iptables rulesNodePort support enabled (default)Default build (
|
@kishorj friendly ping for this |
This option is acceptable to me Option 1 is cleaner, it removes the old rule. |
@@ -571,15 +573,15 @@ func (n *linuxNetwork) buildIptablesConnmarkRules(vpcCIDRs []string, ipt iptable | |||
} | |||
|
|||
var iptableRules []iptablesRule | |||
log.Debugf("Setup Host Network: iptables -t nat -A PREROUTING -i %s+ -m comment --comment \"AWS, outbound connections\" -m state --state NEW -j AWS-CONNMARK-CHAIN-0", n.vethPrefix) | |||
log.Debugf("Setup Host Network: iptables -t nat -A PREROUTING -i %s+ -m comment --comment \"AWS, outbound connections\" -j AWS-CONNMARK-CHAIN-0", n.vethPrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the old rule with redundant state NEW, we can force delete. Append the old rule with shouldExist field set to false.
Tests:
- Add a new UT with old rule, verify the old one gets deleted
- Update the remaining UTs to refer to the new rule
- Upgrading from old ds to new, old rule should not be there
- once the new rules get added, restarting the DS should be fine - no errors in the logs
networking.go:703 has the following log line - |
@kishorj I will update the PR with all your feedback some time this week |
* the rules that need to be installed for NodePort support and SNAT support are very similar. The same traffic mark is needed for both. As a result, rules that are currently installed only when NodePort support is enabled should also be installed when external SNAT is disabled, which is the case by default. * remove "-m state --state NEW" from a rule in the nat table. This is always true for packets that traverse the nat table. * fix typo in one rule's name (extra whitespace). Fixes aws#2025 Co-authored-by: Quan Tian <qtian@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
* Delete legacy nat rule * Fix an unrelated log message Signed-off-by: Antonin Bas <abas@vmware.com>
6fe5050
to
df0e125
Compare
@kishorj I updated this PR with the changes you requested (it's all in the most recent commit) I also ran the requested tests on an EKS cluster.
Here are the PREROUTING chains for the nat table: -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
Here are the PREROUTING chains for the nat table: -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80 The rule has been updated correctly.
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-node
{"level":"info","ts":"2022-08-19T22:36:08.333Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-08-19T22:36:08.334Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-08-19T22:36:08.350Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-08-19T22:36:08.352Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-08-19T22:36:10.363Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.370Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.391Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2022-08-19T22:36:12.395Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2022-08-19T22:36:12.397Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-vpc-cni-init
Copying CNI plugin binaries ...
Configure rp_filter loose...
net.ipv4.conf.eth0.rp_filter = 2
2
net.ipv4.tcp_early_demux = 1
CNI init container done |
/lgtm |
@kishorj I see that the CI tests are not running. I think you may have to approve that on your side since I am a first time contributor to this repo. |
@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge. |
Ok. For what it's worth, I have run the unit tests locally already (with |
Thank you :) But we will have to also run the integration tests. I will see if we can manually trigger it for now :) |
Sorry for the delay, we had disabled the GitHub runners. Now it is up and running, I have submitted for integration tests - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467 /cc @kishorj |
2 attempts of Upstream conformance tests are failing - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467. I can look into it next week. |
I ran the conformance tests locally (after updating the branch), and can confirm that all the tests were passing.
|
Integration tests in progress https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3364855441/jobs/5579689661 |
The latest run of the integration tests have passed: https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3364855441/jobs/5582884522 |
* create publisher with logger (#2119) * Add missing rules when NodePort support is disabled (#2026) * Add missing rules when NodePort support is disabled * the rules that need to be installed for NodePort support and SNAT support are very similar. The same traffic mark is needed for both. As a result, rules that are currently installed only when NodePort support is enabled should also be installed when external SNAT is disabled, which is the case by default. * remove "-m state --state NEW" from a rule in the nat table. This is always true for packets that traverse the nat table. * fix typo in one rule's name (extra whitespace). Fixes #2025 Co-authored-by: Quan Tian <qtian@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> * Fix typos and unit tests Signed-off-by: Antonin Bas <abas@vmware.com> * Minor improvement to code comment Signed-off-by: Antonin Bas <abas@vmware.com> * Address review comments * Delete legacy nat rule * Fix an unrelated log message Signed-off-by: Antonin Bas <abas@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> * downgrade test go.mod to align with root go.mod (#2128) * skip addon installation when addon info is not available (#2131) * Merging test/Makefile and test/go.mod to the root Makefil and go.mod, adjust the .github/workflows and integration test instructions (#2129) * update troubleshooting docs for CNI image (#2132) fix location where make command is run * fix env name in test script (#2136) * optionally allow CLUSTER_ENDPOINT to be used rather than the cluster-ip (#2138) * optionally allow CLUSTER_ENDPOINT to be used rather than the kubernetes cluster ip * remove check for kube-proxy * add version to readme * Add resources config option to cni metrics helper (#2141) * Add resources config option to cni metrics helper * Remove default-empty resources block; replace with conditional * Add metrics for ec2 api calls made by CNI and expose via prometheus (#2142) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * increase workflow role duration to 4 hours (#2148) * Update golang 1.19.2 EKS-D (#2147) * Update golang * Move to EKS distro builds * [HELM]: Move CRD resources to a separate folder as per helm standard (#2144) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * VPC-CNI minimal image builds (#2146) * VPC-CNI minimal image builds * update dependencies for ginkgo when running integration tests * address review comments and break up init main function * review comments for sysctl * Simplify binary installation, fix review comments Since init container is required to always run, let binary installation for external plugins happen in init container. This simplifies the main container entrypoint and the dockerfile for each image. * when IPAMD connection fails, try to teardown pod network using prevResult (#2145) * add env var to enable nftables (#2155) * fix failing weekly cron tests (#2154) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER and remove no-op setter (#2153) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER * update release version comments Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jeffrey Nelson <jdnelson@amazon.com> Co-authored-by: Antonin Bas <antonin.bas@gmail.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> Co-authored-by: Jerry He <37866862+jerryhe1999@users.noreply.github.com> Co-authored-by: Brandon Wagner <wagnerbm@amazon.com> Co-authored-by: Jonathan Ogilvie <679297+jcogilvie@users.noreply.github.com> Co-authored-by: Jay Deokar <jsdeokar@amazon.com>
What type of PR is this?
bug
Which issue does this PR fix:
#2025
What does this PR do / Why do we need it:
support are very similar. The same traffic mark is needed for both. As
a result, rules that are currently installed only when NodePort
support is enabled should also be installed when external SNAT is
disabled, which is the case by default.
always true for packets that traverse the nat table.
Testing done on this change:
amazon-k8s-cni
andamazon-k8s-cni-init
Automation added to e2e:
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Should not break upgrades or downgrade (may have some superfluous iptables rules). Not tested.
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
No, bug fix
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Fixes #2025
Co-authored-by: Quan Tian qtian@vmware.com
Signed-off-by: Antonin Bas abas@vmware.com