Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle delays tied to V6 interfaces #1631

Merged
merged 5 commits into from
Sep 22, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions cmd/routed-eni-cni-plugin/driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"net"
"os"
"syscall"
"time"

"github.com/pkg/errors"
"golang.org/x/sys/unix"
Expand All @@ -43,6 +44,12 @@ const (
fromContainerRulePriority = 1536
// Main routing table number
mainRouteTable = unix.RT_TABLE_MAIN

WAIT_INTERVAL = 50 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find ! Not sure if we plan to support anything beyond AL2 for initial cut. But it might be worth checking if this delay holds good in other distributions as well.

Side note : Going forward we will have multiple eth attachments on single pod with 5G to separate out different flows. Having this delay as configurable option would help until we characterize the actual number for different use cases.

Copy link
Contributor Author

@achevuru achevuru Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the total wait time is actually 10s. WAIT_INTERVAL is essentially how long we wait before checking the status again. I'm assuming 10s might be long enough and the function in ip package that most of the CNI plugins rely on is capping it @10s as well. I see that it usually takes between 1-2s in my testing but if we do run in to a specific requirement/use-case, I guess we can definitely consider making the upper bound configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I don't think it needs to be configurable - we should be able to just 'wait long enough' for every use case, and there's no benefit from timing out and aborting aggressively.

Re other distros: It would be odd to pick something wildly different from the current Linux kernel default values, and I expect the delay will always be around the few-seconds mark. One alternative here is that we either use 'optimistic DAD' which allows userspace to use the address for some purposes while it is still tentative. A better alternative is to just disable DAD altogether on veth interfaces, because we control both ends anyway so there are no surprises here. Meh, at best it gains 1-2s, and we can come back to this later. Even if we disable DAD on veth, we're still going to want this function at some point for "real" network interfaces (eg: trunk, EFA, ENI+ipvlan).

We could also remove the above timer by using netlink events rather than polling (see AddrSubscribe). Again, meh, we can come back to this if this 50ms poll ever becomes an issue.


//Time duration CNI waits for an IPv6 address assigned to an interface
//to move to stable state before error'ing out.
v6DADTimeout = 10 * time.Second
)

// NetworkAPIs defines network API calls
Expand Down Expand Up @@ -208,6 +215,12 @@ func (createVethContext *createVethPairContext) run(hostNS ns.NetNS) error {
return errors.Wrap(err, "setup NS network: failed to add static ARP")
}

if createVethContext.v6Addr != nil && createVethContext.v6Addr.IP.To16() != nil {
if err := WaitForAddressesToBeStable(createVethContext.contVethName, v6DADTimeout); err != nil {
return errors.Wrap(err, "setup NS network: failed while waiting for v6 addresses to be stable")
}
}

// Now that the everything has been successfully set up in the container, move the "host" end of the
// veth into the host namespace.
if err = createVethContext.netLink.LinkSetNsFd(hostVeth, int(hostNS.Fd())); err != nil {
Expand All @@ -216,6 +229,44 @@ func (createVethContext *createVethPairContext) run(hostNS ns.NetNS) error {
return nil
}

// Implements `SettleAddresses` functionality of the `ip` package.
// WaitForAddressesToBeStable waits for all addresses on a link to leave tentative state.
// Will be particularly useful for ipv6, where all addresses need to do DAD.
// If any addresses are still tentative after timeout seconds, then error.
func WaitForAddressesToBeStable(ifName string, timeout time.Duration) error {
link, err := netlink.LinkByName(ifName)
if err != nil {
return fmt.Errorf("failed to retrieve link: %v", err)
}

deadline := time.Now().Add(timeout)
for {
addrs, err := netlink.AddrList(link, netlink.FAMILY_V6)
if err != nil {
return fmt.Errorf("could not list addresses: %v", err)
}

ok := true
for _, addr := range addrs {
if addr.Flags&(syscall.IFA_F_TENTATIVE|syscall.IFA_F_DADFAILED) > 0 {
ok = false
break
}
}

if ok {
return nil
}
if time.Now().After(deadline) {
return fmt.Errorf("link %s still has tentative addresses after %d seconds",
ifName,
timeout)
}

time.Sleep(WAIT_INTERVAL)
}
}

// SetupNS wires up linux networking for a pod's network
func (os *linuxNetwork) SetupNS(hostVethName string, contVethName string, netnsPath string, v4Addr *net.IPNet, v6Addr *net.IPNet,
deviceNumber int, vpcCIDRs []string, useExternalSNAT bool, mtu int, log logger.Logger) error {
Expand Down