Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPv4/IPv6 dual stack KEP #648

Closed
wants to merge 2 commits into from
Closed

Conversation

aojea
Copy link
Member

@aojea aojea commented Dec 5, 2018

Moving kubernetes/community#2254 from k/community to k/enhancements
cc: @leblancd

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 5, 2018
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/pm labels Dec 5, 2018
@leblancd
Copy link

leblancd commented Dec 5, 2018

@aojea - Thanks for moving this. I have some edits in the works based on comments on the previous PR, mostly rearranging some sections. I should be able to get these posted late next week (currently busy with some other issues, as you know :) ).

@uablrek
Copy link

uablrek commented Dec 5, 2018

This remark;

Since IPVS functionality does not yet include IPv6 support...

is not valid. Ipvs has full support for ipv6. The sub-sequent reference to a kube-router issue is just saying that kube-router does not (yet) have ipv6 support.

@leblancd
Copy link

leblancd commented Dec 5, 2018

@uablrek - Thanks for the clarification. That was something that was mentioned earlier, and will be reworded.

@m1093782566
Copy link

Yes, IPVS proxier has full support for ipv6.


One alternative to adding transition mechanisms would be to modify Kubernetes to provide support for IPv4 and IPv6 communications in parallel, for both pods and services, throughout the cluster (a.k.a. "full" dual stack).

A second, simpler alternative, which is a variation to the "full" dual stack model, would be to provide dual stack addresses for pods and nodes, but restrict service IPs to be single-family (i.e. allocated from a single service CIDR). In this case, service IPs in a cluster would be either all IPv4 or all IPv6, as they are now. Compared to a full dual-stack approach, this "dual-stack pods / single-family services" approach saves on implementation complexity, but would introduce some minor feature restrictions. (For more details on these tradeoffs, please refer to the "Variation: Dual-Stack Service CIDRs" section under "Alternatives" below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In re-reading, the complexity saved is pretty minimal. We have to run dual-stack kube-proxy, so I think it's safe to say that the "full" path is almost certain to come to be, though it is 100% OK to add that as a second-step (on the assumption it is easier).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, is it worthwhile to discuss an intermediate delivery where all kube-proxy is single-stack (including NodePorts) ?

In other words: step1 is just pod IPs, hostPorts, and headless services (DNS). step2 adds nodeports, external IPs, ingress, and endpoints. step3 add service IPs and single-family deployments.

Viable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin I'm all for chunking this task as much as we can. Your list of steps seems reasonable to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added implementation plan as suggested in #808

- Service addresses: 1 service IP address per service
- Kube-DNS is expected to be End-of-Life soon, so dual-stack testing will be performed using coreDNS.
- External load balancers that rely on Kubernetes services for load balancing functionality will only work with the IP family that matches the IP family of the cluster's service CIDR.
- Dual-stack support for Kubernetes orchestration tools other than kubeadm (e.g. miniKube, KubeSpray, etc.) are considered outside of the scope of this proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to communicate HOW to enable this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note to communicate usage through documentation - #808

- NodePort: Support listening on both IPv4 and IPv6 addresses
- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible option: check on both? Does this offer any avenue for single-family apps? Maybe not (readiness is per-pod). How would we want health-checks to work in a multi-net model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #808

- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods.
- The pod status API changes will include a per-IP string map for arbitrary annotations, as a placeholder for future Kubernetes enhancements. This mapping is not required for this dual-stack design, but will allow future annotations, e.g. allowing a CNI network plugin to indicate to which network a given IP address applies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to give CNI/CRI hooks to fill these in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed as part of #808

```

##### Default Pod IP Selection
Older servers and clients that were built before the introduction of full dual stack will only be aware of and make use of the original, singular PodIP field above. It is therefore considered to be the default IP address for the pod. When the PodIP and PodIPs fields are populated, the PodIPs[0] field must match the (default) PodIP entry. If a pod has both IPv4 and IPv6 addresses allocated, then the IP address chosen as the default IP address will match the IP family of the cluster's configured service CIDR. For example, if the service CIDR is IPv4, then the IPv4 address will be used as the default address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we want this to work if/when we support dual-stack service IPs?

```
--cluster-cidr ipNetSlice (IP CIDRs, in a comma separated list, Default: [])
```
Only the first CIDR for each IP family will be used; all others will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logged and ignored

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in #808


### IPVS Support and Operation

Since IPVS functionality does not yet include IPv6 support (see [cloudnativelabs/kube-router Issue #307](https://github.com/cloudnativelabs/kube-router/issues/307)), support for IPVS functionality in a dual-stack cluster is considered a "nice-to-have" or stretch goal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this out of date? IPVS does support v6 though kube-proxy might not. I don't think IPVS is optional - it is a GA feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed out of date and removed in #808


Currently, health, liveness, and readiness probes are defined without any concern for IP addresses or families. For the first release of dual-stack support, a cluster administrator will be able to select the preferred IP family to use for probes when a pod has both IPv4 and IPv6 addresses. For this selection, a new "--preferred-probe-ip-family" argument for the for the [kubelet startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) will be added:
```
--preferred-probe-ip-family string ["ipv4", "ipv6", or "none". Default: "none", meaning use the pod's default IP]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtaufen should we be adding flags or JUST component config fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in #808. It was removed as we updated the behavior here.

  • For health/liveness/readiness probe support, the default behavior will not change and an additional optional field would be added to the pod specification and is respected by kubelet. This will allow application developers to select a preferred IP family to use for implementing probes on dual-stack pods.


- Because service IPs will remain single-family, pods will continue to access the CoreDNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR.
- Non-headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR.
- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's endpointFamily configuration (see [Configuration of Endpoint IP Family in Service Definitions](#configuration-of-endpoint-ip-family-in-service-definitions)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 lines up it said "It is not expected that any changes will be needed for CoreDNS" but this is clearly a change. It has to be taught about the new plural field on endpoints

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved as part of #808

fieldPath: status.podIPs
```

This definition will cause an environmental variable setting in the pod similar to the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about extra fields of podIPs? The parameters block, for example? Do we want to do anything with that? I am not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #808

@thockin
Copy link
Member

thockin commented Dec 17, 2018

I want to say what a good KEP this is. Thanks.

@justaugustus
Copy link
Member

Post holiday bump.

Copy link
Member

@justaugustus justaugustus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove any references to NEXT_KEP_NUMBER and rename the KEP to just be the draft date and KEP title.
KEP numbers will be obsolete once #703 merges.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: thockin

If they are not already assigned, you can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

KEP numbers will be obsolete once kubernetes#703 merges.
@thockin
Copy link
Member

thockin commented Jan 26, 2019

@aojea @leblancd

I am reluctant to merge this until we address possible staging of delivery (and who will be working on it).

@lachie83
Copy link
Member

cc @khenidak

@khenidak
Copy link
Contributor

@aojea Thank you for this. Are you planning to carry this forward with implementation as well? or just the KEP?

@thockin
Copy link
Member

thockin commented Jan 28, 2019

The KEP is over a month with no word. @khenidak I think if you want to start thinking about impl, that's fair. You could even open a PR that builds on this one (keeping original work intact) if you want to push it ahead.

@feiskyer
Copy link
Member

I'd like to move this forward if there's still no responses from @aojea or @leblancd.

@uablrek
Copy link

uablrek commented Jan 29, 2019

Hi. Has implementation started? I would like to contribute, foremost in the "ipvs" area since I think the "nice to have" is not the right way, but also in other areas. Where do I sign up?

@aojea
Copy link
Member Author

aojea commented Jan 29, 2019

@feiskyer @khenidak We should ask @leblancd , I was just trying to help to keep this alive but I can't commit to carrying on any activity

- NodePort: Support listening on both IPv4 and IPv6 addresses
- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an auto-detect. Although the cluster will be running in dual stack mode. Some pods will be only using one address family

// Properties: Arbitrary metadata associated with the allocated IP.
type PodIPInfo struct {
IP string
Properties map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be an example of properties?

Copy link
Member Author

@aojea aojea Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
--pod-cidr ipNetSlice [IP CIDRs, comma separated list of CIDRs, Default: []]
```
Only the first address of each IP family will be used; all others will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be difficult for user to understand. I would think an additional flag --pod-cidr-alternative (or so). Forcing the user to think about mail ip address family and alternative ip address family.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
--cluster-cidr ipNetSlice [IP CIDRs, comma separated list of CIDRs, Default: []]
```
Only the first address of each IP family will be used; all others will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as pod-cidr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// "PodIP" field, and this default IP address must be recorded in the
// 0th entry (PodIPs[0]) of the slice. The list is empty if no IPs have
// been allocated yet.
PodIPs []PodIPInfo `json:"podIPs,omitempty" protobuf:"bytes,6,opt,name=podIPs"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would would a situation where a pod has more than two ips? the argument i have here is to have one ip (current) and one additional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv4+IPv6 now. I think it's short sighted to think that the first addition we want to make is also going to be the last.

I can think of things we might want to do in the future like: assigning external addresses directly to pods, IPv6 privacy addresses (which also rotate periodically), addresses from multiple IPv4/IPv6 prefixes during renumbering transitions, or as some future pod-to-pod federation optimisation.
We certainly don't have to design for any of these possibilities now, but once we have an array of addresses, I bet $5 we'll find a reason to use a third address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 can have several addresses on the same interface depending on the scope IPv6 Scoped Address Architecture

#### Type Load Balancer

The cloud provider will provision an external load balancer. If the cloud provider load balancer maps directly to the pod iP's then a dual stack load balancer could be used. Additional information may need to be provided to the cloud provider to configure dual stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will have to gate this KEP on input from the cloud providers. Currently aws, azure and gcp use node ports for routing LB to node (even when CNI that hands out IPs from cluster network are used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed as part of #808

@Arvinderpal
Copy link

I would also love to contribute to this effort. I've been working on adding IPv6 functionality and e2e testing to the kube-router project and have developed a vagrant based ipv6-only test environment.


- Pod Connectivity: IPv4-to-IPv4 and IPv6-to-IPv6 access between pods
- Access to External Servers: IPv4-to-IPv4 and IPv6-to-IPv6 access from pods to external servers
- NGINX Ingress Controller Access: Access from IPv4 and/or IPv6 external clients to Kubernetes services via the Kubernetes NGINX Ingress Controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you shed some light on why the nginx ingress controller is part of the goals of this KEP? Does that mean there is a commitment to implement the necessary changes (if any) to the nginx ingress controller? Why would we not leave that up to individual ingress controllers to implement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained in L456

@khenidak
Copy link
Contributor

I think we are also missing node cidr modification. I think we should also follow the model where the node has one cidr as is then list of cidrs as a secondary property. That way CNIs that depend on node.cidr can allocate multi family ips as needed

CC @lachie83

@lachie83
Copy link
Member

I think we are also missing node cidr modification. I think we should also follow the model where the node has one cidr as is then list of cidrs as a secondary property. That way CNIs that depend on node.cidr can allocate multi family ips as needed

CC @lachie83

Added as part of #808

Awareness of Multiple NodeCIDRs per Node

As with PodIP, corresponding changes will need to be made to NodeCIDR. These changes are essentially the same as the aformentioned PodIP changes which create the pularalization of NodeCIDRs to a slice rather than a singular and making those changes across the internal representation and v1 with associated conversations.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 21, 2019
@aojea
Copy link
Member Author

aojea commented Feb 21, 2019

obsoleted by #808

@aojea aojea closed this Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.