Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation to speed up pod creation #680

Closed
MikeZappa87 opened this issue Dec 24, 2021 · 13 comments
Closed

Investigation to speed up pod creation #680

MikeZappa87 opened this issue Dec 24, 2021 · 13 comments

Comments

@MikeZappa87
Copy link
Contributor

MikeZappa87 commented Dec 24, 2021

An effort is happening to speed up the pod creation and two areas of interest exist. One is the CNI setup and the other is the image pull, if both these items are improved pod setup can be done in less than a second. The area of interest in the CNI is this:

https://github.com/containernetworking/plugins/blob/master/pkg/ip/addr_linux.go

This method takes 1000ms - 1900ms to execute. This code is related to the Duplicate Address Detection feature of IPV6 and is looking at two flags Tentative and DadFailed. I still need to look over a few RFC's on this as well, including optimistic dad.

If something was missed or overlooked let's tackle it!

Initial Observations:

  • Having a boolean flag to bypass SettleAddreses reduced the execution to less than 100ms

  • Errors from SettleAddresses are not handled and failures are silent

  • Unable to find other codebases doing something similar (Calico as an example)

  • Only related to IPV6 addresses

  • Other parts of the code that assign IP addresses does not utilize SettleAddresses

  • More likely to succeed vs fail

Questions:

  • Is this method “SettleAddresses” still required?

  • Was “SettleAddresses” implemented due to a shortcoming in the Linux Kernel/Netlink/Go Netlink Library? Has it since been resolved and renders this method obsolete?

  • If Dad is disabled on the specific interfaces via “accept_dad” should SettleAddresses be used?

Proposals:

  • Remove the code entirely

  • Add a boolean flag to enable/disable SettleAddresses via runtime configuration (Eventually to be applied as an annotation to the pod spec)

  • Configuration value that allows user to optionally not set the IPV6 address (Eventually to be applied as an annotation to the pod spec). User could specify IPV4,IPV6 or BOTH to assign

WorkArounds:
Do not use IPV6

Please feel free to reach out directly to me! I will help implement this and the go-cni/containerd side as well

@uablrek
Copy link
Contributor

uablrek commented Dec 25, 2021

If Dad is disabled on the specific interfaces via “accept_dad” should SettleAddresses be used?

If accept_dad=0, does SettleAddresses really still take 1000-1900ms?

@MikeZappa87
Copy link
Contributor Author

If Dad is disabled on the specific interfaces via “accept_dad” should SettleAddresses be used?

If accept_dad=0, does SettleAddresses really still take 1000-1900ms?

I set accept_dad=0 to both sides of the veth and had no impact on execution time, might be something else? Will look at it late!

@uablrek
Copy link
Contributor

uablrek commented Dec 27, 2021

You must do it in the right order. I made a simple go program that does;

	start := time.Now()
	err := SettleAddresses(os.Args[1], 4)
	if err != nil {
		fmt.Println(err)
		os.Exit(-1)
	}
	fmt.Println(time.Since(start).Milliseconds())

If I call it like this in a netns (if name is "host");

	ip link set up dev host
	sysctl -qw net.ipv6.conf.host.accept_dad=0
	ip addr add $ip4/32 dev host
	ip -6 addr add 1000::1:$ip4/128 dev host
        settle-address host

the settle-address host takes ~1300-1800mS. But like this;

	sysctl -qw net.ipv6.conf.host.accept_dad=0
	ip link set up dev host
	ip addr add $ip4/32 dev host
	ip -6 addr add 1000::1:$ip4/128 dev host
        settle-address host

it takes 0mS.

So you must set accept_dad=0 before you to "up" on the interface.

@uablrek
Copy link
Contributor

uablrek commented Dec 27, 2021

The reason for "SettleAddresses" is most likely that applications fails if the address is not ready when they start. It can of course be solved by doing the equivalent of SettleAddresses in an init-container but if you just skip SettleAddresses you will probably get a stream of issues.

Also some users may rely on DAD, so just turning it off may also break things. But I think users that need dad are very few so IMO the right way is to disable dad before bringing the interface up by default, and dad should be optional, and keep SettleAddresses() as-is.

@uablrek
Copy link
Contributor

uablrek commented Dec 27, 2021

You may check if there really is a use-case for DAD on a veth-pair. I can't imagine any, so maybe just turn it off.

@MikeZappa87
Copy link
Contributor Author

MikeZappa87 commented Dec 27, 2021

In my test I am using the cnitool with { "cniVersion": "1.0.0", "name": "containerd-net", "plugins": [ { "type": "bridge", "bridge": "cni0-test", "isGateway": true, "ipMasq": true, "promiscMode": true, "ipam": { "type": "host-local", "ranges": [ [{ "subnet": "10.88.0.0/16" }], [{ "subnet": "2001:4860:4860::/64" }] ], "routes": [ { "dst": "0.0.0.0/0" }, { "dst": "::/0" } ] } }, { "type": "portmap", "capabilities": {"portMappings": true} } ] }

And setting the accept_dad =0 before the ConfigureIface method in bridge.go which sets the veth to up. I still have the slowness and have verified accept_dad is 0 as well. I can put a link up to my fork so that you can run it? I do notice the 'disableIPV6DAD' that exists does a check if enhanced_dad is equal to 1 and does not set accept_dad to 0.

MikeZappa87@e482355

As for is SettleAddresses still required? I dug through Calico to see if they were doing something, they aren't. They actually set accept_dad=0 for both veth pairs. For my testing, I also just assigned an IPV6 address and did not experience issues with a process binding to it (nginx). I still wonder if SettleAddresses was put in place to fill a hole in with an issue with the Linux kernel/netlink.

Right now I think we need to verify your test case since the code I am running still has the issue. However, this could be isolated to my machine however I did run this across several different servers in my environment and still have the slowness.

@uablrek
Copy link
Contributor

uablrek commented Dec 27, 2021

I still wonder if SettleAddresses was put in place to fill a hole in with an issue with the Linux kernel/netlink.

Maybe. I am using linux-5.15.2 which is quite recent. Which kernel are you using?

@MikeZappa87
Copy link
Contributor Author

MikeZappa87 commented Dec 27, 2021

I have tested on 5.10/5.12 and we have 5.15 in our env var as well. I haven't tested on 5.15 yet. Did you test using my branch?

ip netns add cni-1234
CNI_CONTAINERID=testthis CNI_PATH=/opt/cni/bin NETCONFPATH=. ./cnitool add containerd-net /var/run/netns/cni-1234

I am curious how this is being tested as well. I provided my cnitool parameters as well

@uablrek
Copy link
Contributor

uablrek commented Dec 28, 2021

Did you test using my branch?

No, not yet. It is a bit too much work at the moment. To test "stand alone" is easy in my env.

I tested on linux-5.4.35 and it works the same way as linux-5.15.2.

I still wonder if SettleAddresses was put in place to fill a hole in with an issue with the Linux kernel/netlink.

I don't think so. DAD requires that an address is not used until it's confirmed that it's not used by some other machine on the segment. That takes time, and during that time any attempt to use the address will cause an error (try "ping" for instance). Many applications handles that badly and SettleAddresses() makes sure that all addresses are usable when applications start.

If SettleAddresses() still takes ~1.5s in your env, addresses are not ready. SettleAddresses() is not the cause of the problem, it is a symptom. You must find a way to reduce the time spent in SettleAddresses() (which is caused by DAD), not remove it. If calico doesn't have anything similar they most likely would not have a 1.5s delay even if they used a SettleAddresses(). They have found a way to eliminate the DAD timeout.

In a veth+bridge setup DAD is most likely not necessary, but it may be for other cni-plugins, e.g macvlan or ipvlan. I don't know if it's used only for veth+bridge, but be careful if not.

@uablrek
Copy link
Contributor

uablrek commented Dec 29, 2021

Now I have checked your patch, and it is not sufficient. You must disable dad when the veth's are created. Here is a patch that works; cni-plugins.diff.txt

NOTE; the patch disables dad unconditionally for veth and bridge, and that is probably not what you want to do. But just for testing it's ok.

@MikeZappa87
Copy link
Contributor Author

MikeZappa87 commented Dec 29, 2021

I will check this once I get home! I appreciate you digging into this.

Edit:
I didn't traverse down into the veth creation code and missed those pieces. Initial testing this appears to be resolved! I am guessing the next steps here are to provide the user the ability to configure accept_dad via a bool in the runtime configuration? If the desire is to eventually allow the user the apply this as an annotation on the pod spec to disable dad (similar to bandwidth and others)? I was going to handle the containerd PR and the PR to update the bridge plugin? Let me know what you think @uablrek No rush since its the holidays!

@uablrek
Copy link
Contributor

uablrek commented Jan 8, 2022

First I must make clear that I am not a CNI maintainer, so you should probably ask someone else. But...

I agree that dad should be made configurable. However IMO the configuration should be on CNI-plugin level, not an annotation. I think an annotation would require an CNI API update (a new parameter). I would suggest something like;

{
  "cniVersion": "1.0.0",
  "name": "cni-x",
  "type": "bridge",
  "bridge": "cbr0",
  "isDefaultGateway": true,
  "hairpinMode": true,
  "omitDAD": true,
  "ipam": {
    "type": "host-local",
    "ranges": [
      [ { "subnet": "1100::100/120" } ],
      [ { "subnet": "11.0.1.0/24" } ]
    ]
  }
}

@MikeZappa87
Copy link
Contributor Author

Hello Lars! I did realize that a few days ago ha! However you got me over the hurdle so +1 beer to you.

To clarify what I mean is using https://www.cni.dev/docs/spec/#deriving-runtimeconfig and https://www.cni.dev/docs/conventions/

I believe the desire on our side is to make this a annotation we can apply to the pod so we can optionally set it and the container runtime will pass it.

I do like 'omitDAD' vs what I currently have is 'disableipv6dad'

{ "cniVersion": "1.0.0", "name": "containerd-net", "plugins": [ { "type": "bridge", "bridge": "cni0", "isGateway": true, "ipMasq": true, "promiscMode": true, "runtimeConfig":{ "disableipv6dad": true }, "ipam": { "type": "host-local", "ranges": [ [{ "subnet": "10.88.0.0/16" }], [{ "subnet": "2001:4860:4860::/64" }] ], "routes": [ { "dst": "0.0.0.0/0" }, { "dst": "::/0" } ] } }, { "type": "portmap", "capabilities": {"portMappings": true} } ] }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants