-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate IP Addresses for pods w/o host network #711
Comments
Hi @uruddarraju, the CNI will not reuse IPs directly, but after a 1 minute cool down they will be available to be assigned to pods again. Do you use the default configuration? What kind of nodes are you running on, and how many pods per node? If you have a lot of churn, it helps to pre-allocate the IPs. Also, how come you use v1.5.2? I'd recommend upgrading to v1.5.3. |
Thanks @mogren. I am not sure that is the problem though. As you can see in the above two examples I gave, the pods have been spun up at |
@uruddarraju sorry for delay in getting back to you on this! As @mogren mentioned, it would be good to upgrade to at least 1.5.3 and see if the issue with duplicate IPs goes away. |
I'm working with @uruddarraju on this problem. We have been able to consistently observe pods receiving duplicate IP addresses over the past few days which is usually correlated with the startup time of the L-IPAMD process. Here is the subset of log lines that show the outline of the problem scenario: On node Timeline:
Relevant subset of log:
First, it's unclear why Second, even through the code progresses to the point of |
The related PR needs to be rebased, and still has some required changes. |
We've seen this too. |
Since #972, we read this directly from the CRI socket instead of using the watcher. |
@uthark @uruddarraju Late update, but is this still an issue with v1.6.4 or v1.7.0? |
v1.7.2-rc1 is a release candidate that includes a lot of fixes to resolve this type of issues. |
@uruddarraju, @ataranto Since v1.5.x, we have moved away from using the informer and instead we query the CRI socket directly. These changes should prevent this issue from happening again. |
We are running a 1.12.6 cluster provisioned with Kops.
Networking setup: aws-k8s-cni with calico for network policies
Using:
602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.2
We saw a few networking issues for pods created in our integration testing cluster. This cluster experiences a lot of churn where we consistently spin up 100s of pods and delete them periodically. We observed a few pods experiencing some network issues and upon digging a little more, we found the following:
Trying to find if these pods are using host network,
And you can see both those pods using the ip
10.20.72.196
We cleaned up the pods experiencing these duplicate IP issue a few times and that does not seem to solve the problem(which makes sense as we still dont know the root cause)
One common scenario we observer in all cases above, atleast one pod has a restart count > 0 or they are in CrashLoopBackoff/Error/Completed states.
Ideally, the cni should only be reclaiming IPs of pods not running that have a restartPolicy of Never. I am not sure if this is the behavior today.
The text was updated successfully, but these errors were encountered: