Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle delays tied to V6 interfaces #1631

Merged
merged 5 commits into from
Sep 22, 2021
Merged

Handle delays tied to V6 interfaces #1631

merged 5 commits into from
Sep 22, 2021

Conversation

achevuru
Copy link
Contributor

What type of PR is this?
bug

What does this PR do / Why do we need it:
V6 addresses assigned to an interface might take a while before they transition from tentative state to stable state as all addresses need to go through Duplicate Address Detection (DAD). PR introduces a check to make sure the address is in stable state before CNI returns.

Testing done on this change:
Verified that there is no packet loss observed right after pod boot-up due to the issue documented above.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@achevuru achevuru requested a review from anguslees September 22, 2021 00:00
@achevuru achevuru requested a review from jayanthvn September 22, 2021 00:28
@@ -43,6 +44,8 @@ const (
fromContainerRulePriority = 1536
// Main routing table number
mainRouteTable = unix.RT_TABLE_MAIN

WAIT_INTERVAL = 50 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find ! Not sure if we plan to support anything beyond AL2 for initial cut. But it might be worth checking if this delay holds good in other distributions as well.

Side note : Going forward we will have multiple eth attachments on single pod with 5G to separate out different flows. Having this delay as configurable option would help until we characterize the actual number for different use cases.

Copy link
Contributor Author

@achevuru achevuru Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the total wait time is actually 10s. WAIT_INTERVAL is essentially how long we wait before checking the status again. I'm assuming 10s might be long enough and the function in ip package that most of the CNI plugins rely on is capping it @10s as well. I see that it usually takes between 1-2s in my testing but if we do run in to a specific requirement/use-case, I guess we can definitely consider making the upper bound configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I don't think it needs to be configurable - we should be able to just 'wait long enough' for every use case, and there's no benefit from timing out and aborting aggressively.

Re other distros: It would be odd to pick something wildly different from the current Linux kernel default values, and I expect the delay will always be around the few-seconds mark. One alternative here is that we either use 'optimistic DAD' which allows userspace to use the address for some purposes while it is still tentative. A better alternative is to just disable DAD altogether on veth interfaces, because we control both ends anyway so there are no surprises here. Meh, at best it gains 1-2s, and we can come back to this later. Even if we disable DAD on veth, we're still going to want this function at some point for "real" network interfaces (eg: trunk, EFA, ENI+ipvlan).

We could also remove the above timer by using netlink events rather than polling (see AddrSubscribe). Again, meh, we can come back to this if this 50ms poll ever becomes an issue.

@srini-ram srini-ram self-requested a review September 22, 2021 01:13
@@ -43,6 +44,8 @@ const (
fromContainerRulePriority = 1536
// Main routing table number
mainRouteTable = unix.RT_TABLE_MAIN

WAIT_INTERVAL = 50 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I don't think it needs to be configurable - we should be able to just 'wait long enough' for every use case, and there's no benefit from timing out and aborting aggressively.

Re other distros: It would be odd to pick something wildly different from the current Linux kernel default values, and I expect the delay will always be around the few-seconds mark. One alternative here is that we either use 'optimistic DAD' which allows userspace to use the address for some purposes while it is still tentative. A better alternative is to just disable DAD altogether on veth interfaces, because we control both ends anyway so there are no surprises here. Meh, at best it gains 1-2s, and we can come back to this later. Even if we disable DAD on veth, we're still going to want this function at some point for "real" network interfaces (eg: trunk, EFA, ENI+ipvlan).

We could also remove the above timer by using netlink events rather than polling (see AddrSubscribe). Again, meh, we can come back to this if this 50ms poll ever becomes an issue.

cmd/routed-eni-cni-plugin/driver/driver.go Outdated Show resolved Hide resolved
cmd/routed-eni-cni-plugin/driver/driver.go Outdated Show resolved Hide resolved
cmd/routed-eni-cni-plugin/driver/driver.go Outdated Show resolved Hide resolved
@anguslees
Copy link
Contributor

(nice, code style comments only)

@srini-ram srini-ram requested a review from anguslees September 22, 2021 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants