Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #5739: Store NetworkPolicy in filesystem as fallback data source #5777: Enable Pod network after realizing initial NetworkPolicies #5795: Support Local ExternalTrafficPolicy for Services with #5798: Fix unit test TestReconcile #5833: Enable IPv4/IPv6 forwarding on demand automatically #5860

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jan 10, 2024

Cherry pick of #5739 #5777 #5795 #5798 #5833 on release-1.13.

#5739: Store NetworkPolicy in filesystem as fallback data source
#5777: Enable Pod network after realizing initial NetworkPolicies
#5795: Support Local ExternalTrafficPolicy for Services with
#5798: Fix unit test TestReconcile
#5833: Enable IPv4/IPv6 forwarding on demand automatically

For details on the cherry pick process, see the cherry pick requests page.

@tnqn tnqn added the kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release label Jan 10, 2024
tnqn added 5 commits January 10, 2024 21:42
In the previous implementation, traffic from/to a Pod may bypass
NetworkPolicies applied to the Pod in a time window when the agent
restarts because realizing NetworkPolicies and enabling forwarding are
asynchronous.

This patch stores NetworkPolicy data in files when they are received,
and makes antre-agent fallback to use the files as data source if it
can't connect to antrea-controller on startup. This prevents security
regression: a NetworkPolicy that has been realized on a Node will
continue to work even if antrea-controller is not available after
antrea-agent restarts.

The benchmark results of the storage's operations are as below:

BenchmarkFileStoreAddNetworkPolicy-40              70383             16102 ns/op             520 B/op          9 allocs/op
BenchmarkFileStoreAddAppliedToGroup-40             45382             25880 ns/op            3019 B/op          9 allocs/op
BenchmarkFileStoreAddAddressGroup-40                7400            180000 ns/op           49538 B/op          9 allocs/op
BenchmarkFileStoreReplaceAll-40                       10         127088004 ns/op        17815943 B/op      33099 allocs/op

The disk usage when storing 1k NetworkPolicies, AddressGroups, and
AppliedToGroups created by BenchmarkFileStoreReplaceAll is as below:

16M     /var/run/antrea-test/file-store/address-groups
4.0M    /var/run/antrea-test/file-store/applied-to-groups
4.0M    /var/run/antrea-test/file-store/network-policies

Signed-off-by: Quan Tian <qtian@vmware.com>
Pod network should only be enabled after realizing initial
NetworkPolicies, otherwise traffic from/to Pods could bypass
NetworkPolicy when antrea-agent restarts.

After commit f9fc979 ("Store NetworkPolicy in filesystem as
fallback data source"), antrea-agent can realize either the latest
NetworkPolicies got from antrea-controller or the ones got from
filesystem as fallback. Therefore, waiting for NetworkPolicies to be
realized should not add marked delay or make antrea-controller a failure
point of Pod network.

This commit adds an implementation of wait group capable of waiting with
a timeout, and uses it to wait for common initialization and
NetworkPolicy realization before installing any flows for Pods. More
preconditions can be added via the wait group if needed in the future.

Signed-off-by: Quan Tian <qtian@vmware.com>
Since K8s 1.29, setting Local ExternalTrafficPolicy for ClusterIP
Services with ExternalIPs is supported.

Signed-off-by: Quan Tian <qtian@vmware.com>
cniServer.reconcile() now installs flows asynchorously.

Signed-off-by: Quan Tian <qtian@vmware.com>
Although it has been documented as a prerequisite in [1], there are
some platforms not enabling ip forwarding by default. kube-proxy ipvs
mode and some CNIs enable it by themselves to ensure Pod networking
work properly.

As Antrea needs IP forwarding to be enabled, there seems no reason to
not do it by itself, rather than expecting users or other components to
do it.

[1] https://kubernetes.io/docs/setup/production-environment/container-runtimes/#install-and-configure-prerequisites

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn tnqn force-pushed the automated-cherry-pick-of-#5739-#5777-#5795-#5798-#5833-upstream-release-1.13 branch from 42edc54 to f57e6f7 Compare January 10, 2024 13:43
@tnqn
Copy link
Member Author

tnqn commented Jan 11, 2024

/test-all

@tnqn tnqn requested review from antoninbas and luolanzone January 11, 2024 14:20
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I feel like it may be better in the future to have separate cherry-picks for unrelated PRs (such as #5833)

@tnqn
Copy link
Member Author

tnqn commented Jan 12, 2024

LGTM I feel like it may be better in the future to have separate cherry-picks for unrelated PRs (such as #5833)

sure, will do.

@tnqn tnqn merged commit b237f7c into antrea-io:release-1.13 Jan 12, 2024
44 of 51 checks passed
@tnqn tnqn deleted the automated-cherry-pick-of-#5739-#5777-#5795-#5798-#5833-upstream-release-1.13 branch January 12, 2024 03:49
@tnqn
Copy link
Member Author

tnqn commented Jan 12, 2024

LGTM I feel like it may be better in the future to have separate cherry-picks for unrelated PRs (such as #5833)

sure, will do.

BTW, I used "rebase and merge" to not mix them in a commit, if it's what you were concerned.

@antoninbas
Copy link
Contributor

LGTM I feel like it may be better in the future to have separate cherry-picks for unrelated PRs (such as #5833)

sure, will do.

BTW, I used "rebase and merge" to not mix them in a commit, if it's what you were concerned.

I was also wondering about release note generation, but I think our prepare-changelog.sh script already handles this case and will extract all individual PRs?

@tnqn
Copy link
Member Author

tnqn commented Jan 12, 2024

I was also wondering about release note generation, but I think our prepare-changelog.sh script already handles this case and will extract all individual PRs?

Yes, it has been taken into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants