-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: get container ID from kube rather than docker #371
fix: get container ID from kube rather than docker #371
Conversation
a6ee2f9
to
827328e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran conformance tests and everything looks good. Thanks, this was a nice cleanup, just a small logging comment.
I'll update our dependencies in another PR, guess we don't need the docker client import any more.
This change needs to be carefully tested out to make sure no same IP can be allocated to different Pods. To get same info from kube, you need to make sure ipamD have learned all running Pod IPs before start accepting CNI-ADD. |
Thanks @liwenwu-amazon, that is good to know. We won't merge this without some more testing. |
I dumped the list of ENIs on a host running 5 pods, 3 of which were hostNetwork=true, and saw that 2 were allocated as expected. Then, I FWIW, I've been running a cluster off of this branch with several nodes and several dozen pods for about 10 days. A few of the CNI daemonset pods have restarts for various reasons, but I haven't seen any issues so far with duplicate IP allocation. If there's anything more specific I should run some tests for, please let me know. I'm curious to understand how getting the container ID from kube API vs Docker has an impact on how IPs are allocated. It doesn't seem like the container ID is ever read again after the initial fetch. |
@rudoi @mogren I think you should try to restart ipamD while Pods are being scaled up and down rapidly. And make sure there is no race condition when using kube API. Here is one example:
With Docker API, the ipamD make sure it finds out all running Pod before start accepting CNI-add request. |
Understood, @liwenwu-amazon. Thanks for clarifying. I'll try my hardest to cause that race. 😄 |
c387c66
to
574334a
Compare
@rudoi Hi again! I like this change, and I think we have some great integration tests to test this. I still bumped this change out to the v1.6 Milestone since I'm trying to get v1.5 out soon. Would you care to rebase this change against the latest master? |
574334a
to
821467c
Compare
f4e4ee2
to
4a30273
Compare
Hi @mogren! Rebased, made sure tests are passing, etc. Looking forward to seeing the integration tests! I did some stress testing last week but I wasn't totally satisfied with my scenarios. |
@rudoi - thanks for this PR. We are about to deploy some of our clusters with this fix - so hopefully we'll get some real world use case before it gets merged into 1.6. |
@dadux - awesome, looking forward to seeing the results! We've been running this for about 3 weeks in real clusters and have not run into anything yet. -fingers crossed- |
Fix tests
Hi @mogren - what is preventing this to get merged ? Happy to provide some help if needed ! |
Happy to help out as well! |
* fix: get container ID from kube rather than docker * chore: add log statement when containerID is found * fix: log containerID after assignment * fix: update unit tests (cherry picked from commit 14de538)
This reverts commit 14de538.
This reverts commit 14de538.
This reverts commit 14de538.
This reverts commit 14de538.
* Revert "fix: get container ID from kube rather than docker (#371)" This reverts commit 14de538. * go mod tidy * Add initial proof-of-concept implementation of CRI support. * Update CRI socket path to /var/run/cri.sock * Filter ready sandboxes and abort if there's a pod UID conflict. * Revert "fix: get container ID from kube rather than docker (#371)" This reverts commit 14de538. * Address review comments, refactor to use pod sandbox nomenclature consistently. * Bail if we can't retrieve local pod sandboxes on startup.
* Revert "fix: get container ID from kube rather than docker (aws#371)" This reverts commit 14de538. * go mod tidy * Add initial proof-of-concept implementation of CRI support. * Update CRI socket path to /var/run/cri.sock * Filter ready sandboxes and abort if there's a pod UID conflict. * Revert "fix: get container ID from kube rather than docker (aws#371)" This reverts commit 14de538. * Address review comments, refactor to use pod sandbox nomenclature consistently. * Bail if we can't retrieve local pod sandboxes on startup.
) * Revert "fix: get container ID from kube rather than docker (#371)" This reverts commit 14de538. * go mod tidy * Add initial proof-of-concept implementation of CRI support. * Update CRI socket path to /var/run/cri.sock * Filter ready sandboxes and abort if there's a pod UID conflict. * Revert "fix: get container ID from kube rather than docker (#371)" This reverts commit 14de538. * Address review comments, refactor to use pod sandbox nomenclature consistently. * Bail if we can't retrieve local pod sandboxes on startup.
Issue #, if available:
#365
Description of changes:
Gets container ID from Kubernetes rather than Docker. This makes the CNI runtime-agnostic.
When using containerd instead of Docker, the CNI essentially skips allocating addresses for all pods because it can't find containers to associate them with. The current retry logic doesn't fail hard when the CNI is unable to connect to a Docker sock. New pods appear to successfully obtain IP addresses, but when the CNI pod restarts, these IPs will all be freed because the CNI will be unable to connect to Docker and therefore will skip allocating IPs for all pods on the host.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.