Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return traffic can be denied for a short duration once the policies are reconciled on a new pod #345

Open
Pavani-Panakanti opened this issue Dec 6, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@Pavani-Panakanti
Copy link
Contributor

Pavani-Panakanti commented Dec 6, 2024

What happened:
In standard mode, we do a default allow at pod startup and all traffic is allowed before policies are reconciled. It takes 1-2secs for the policies to be reconciled on the new pod. Once the network policy reconciliation happens, we start tracking the flows in conntrack table. For return traffic we check if entry is present in conntrack table and allow it accordingly. For traffic which exited the pod before network policies were applied and return traffic came after policies were applied, the return traffic will be denied as entry is not tracked in conntrack table

As a mitigation, 2-5secs delay can be added at the pod startup using init container. As a result, traffic will start going out of the pod only after network policies were applied and there will be no denies in the return traffic

We are actively working on fixing this issue, so that cx can use standard mode without the need to add sleep at pod startup. Fix for this issue can be tracked here

Please note that this issue happens only with standard mode and not in strict mode

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • CNI Version
  • Network Policy Agent Version
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
@Pavani-Panakanti Pavani-Panakanti added the bug Something isn't working label Dec 6, 2024
@Pavani-Panakanti Pavani-Panakanti changed the title [Standard mode] Return traffic can be denied for a short duration once the policies are reconciled on a new pod Return traffic can be denied for a short duration once the policies are reconciled on a new pod Dec 7, 2024
@youwalther65
Copy link

NetPol are always asynchronously reconciled. What makes standard mode so special here compared to strict mode? Can you please elaborate on the technical side a bit.

@m00lecule
Copy link

m00lecule commented Dec 9, 2024

Secondly I believe the strict mode should be also reviewed in context of postponed startup - https://docs.aws.amazon.com/eks/latest/userguide/cni-network-policy-configure.html#cni-network-policy-configure-policy.

In strict mode the pods are starting in default deny mode (for 1-2s workloads cannot access anything), indicating the startup should be also postponed by 5s to ensure the networkpolicies are reconciled. Before the initial reconciliation the pods are isolated from networking perspective which is not a useful state.

Eventually all of strict mode users will consider postponing startup by few seconds to ensure smooth operations. I believe we could do a favor to strict mode users and delay the workloads startup for everybody. Te goal is to ensure they won't be obligated to introduce some home crafted startup commands after trying our EKS + vpc-cni + networkpolicy enabled, which would lead to much smoother experience for upcoming EKS users.

@Pavani-Panakanti
Copy link
Contributor Author

@youwalther65 In strict mode, we do default deny before policies are applied on the first pod, so no egress traffic goes out of the pod before policies were applied. So above issue will not happen where response packet will be denied as entry is missing in conntrack table for traffic that egressed out of pod before applying policies

@Pavani-Panakanti
Copy link
Contributor Author

@m00lecule We are looking into improving the user experience for strict mode. This is something we are prioritizing. We will provide more details soon

@janavenkat
Copy link

Syncing here as well aws/amazon-vpc-cni-k8s#3206 (comment)

@m00lecule
Copy link

@Pavani-Panakanti The issue is still present after upgrading vpc-cni to v1.19.3-eksbuild.1.

@Pavani-Panakanti
Copy link
Contributor Author

Looking into this. Will add an update soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants