Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add revised IPv4/IPv6 dual stack KEP #808

Merged
merged 9 commits into from
May 3, 2019

Conversation

lachie83
Copy link
Member

@lachie83 lachie83 commented Feb 4, 2019

Fixes #563

This PR adds to the great work in #648. Specifically we have expanded the scope of authors in include community members that have expressed interest in helping with the KEP and implementation.

At PR submission the following changes have been made to the KEP in #648:

  • Add new authors
  • Update last-updated
  • Add Implementation Plan section

/cc @leblancd @rpothier @khenidak @feiskyer @thockin

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
@k8s-ci-robot
Copy link
Contributor

@lachie83: GitHub didn't allow me to request PR reviews from the following users: leblancd, rpothier.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Fixes #563

This PR adds to the great work in #648. Specifically we have expanded the scope of authors in include community members that have expressed interest in helping with the KEP and implementation.

At PR submission the following changes have been made to the KEP in #648:

  • Add new authors
  • Update last-updated
  • Add Implementation Plan section

/cc @leblancd @rpothier @khenidak @feiskyer @thockin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from khenidak February 4, 2019 17:52
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/pm size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 4, 2019
@lachie83
Copy link
Member Author

lachie83 commented Feb 4, 2019

If we are in agreement that the work continue on this PR then we should address all outstanding comments in #648 then update issue #563 to reference this PR

@thockin thockin self-assigned this Feb 4, 2019
@thockin
Copy link
Member

thockin commented Feb 4, 2019

Yes, please..

@leblancd @rpothier FYI

@khenidak
Copy link
Contributor

khenidak commented Feb 6, 2019

I think i had this as an offline discussion with @lachie83, so re-sharing here. There is an impact on node struct and kubenet ipam et al , right now node struct has one field for cidr, with this design it should have a similar approach to pod.ip as described here https://github.com/kubernetes/enhancements/pull/808/files#diff-df06b6d8f252f5f3711baaaf155ed3eaR193

@lachie83
Copy link
Member Author

Associated PR kubernetes/kubernetes#73977

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
…tion

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
@lachie83
Copy link
Member Author

lachie83 commented Feb 21, 2019

Current unresolved comments from previous KEP PR:

We should probably change this to CIDRs and assume that "no size portion" == "/32"

This is my least favorite part of this KEP. I'd rather we not do this for now and instead add full support via something like a new ipFamily field either for the whole service (including cluster IP) or even per-port. Semantics of it TBD, since I am advocating punting.

How will we want this to work if/when we support dual-stack service IPs?

@lachie83
Copy link
Member Author

@thockin - I believe this is ready for another review. I'm punting on the implementation details TBD for both Single-family services and Muti-family services in order to proceed with the first phase of "pod-to-pod dual-stack routing and non-vip services" as detailed in the implementation plan.

Following a clean review I would like to merge then proceed with the creation of a PR to set the status of this KEP to implementable.

@lachie83
Copy link
Member Author

@aojea - Should I add you as an author of this KEP as well?

@aojea
Copy link
Member

aojea commented Feb 26, 2019

@aojea - Should I add you as an author of this KEP as well?

Thanks @lachie83 , but I don't think I've done enough merits 😅

This dual-stack proposal will introduce a new IPNetSlice object to spf13.pflag to allow parsing of comma separated CIDRs. Refer to [https://github.com/spf13/pflag/pull/170](https://github.com/spf13/pflag/pull/170)

### End-to-End Test Support
End-to-End tests will be updated for Dual Stack. The Dual Stack E2E tests will use deployment scripts from the kubernetes-sigs/kubeadm-dind-cluster github repo to set up a containerized, multi-node Kubernetes cluster that is running in a Prow container (Docker-in-Docker-in-Docker, or DinDinD configuration), similar to the IPv6-only E2E tests (see [test-infra PR # 7529](https://github.com/kubernetes/test-infra/pull/7529)). The DinDinD cluster will be updated to support dual-stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working in adding IPv6 support to kind and I plan to add jobs to test IPv6 and DualStack scenarios later with kind

kubernetes-sigs/kind#348

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add this to the KEP

The kubeadm configuration options for advertiseAddress and podSubnet will need to be changed to handle a comma-separated list of CIDRs:
```
api:
advertiseAddress: "fd00:90::2,10.90.0.2" [Multiple IP CIDRs, comma separated list of CIDRs]
Copy link
Member

@aojea aojea Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently in kubeadm if the advertiseAddress is empty it takes the first IPv4 address
Looking at the code, we are giving priority to IPv4 over IPv6 , do we need to change this behavior?
If affirmative, this makes me think about all the defaults option of the different components (kubelet --address and --healthz-bind-address , ...) that IMHO it is too big to define in this KEP.

Should we add a section to define what will be the configuration options default behavior for ip addresses?, i.e..

### Configuration options defaults

For all the configuration options that require an IP address or Subnet, in case this option is not provided or set, it should look equally for both IPv4 and IPv6 addresses or networks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, we are giving priority to IPv4 over IPv6 , do we need to change this behavior?

no, we cannot. ipv4 should still take precedence over ipv6.
ipv6 needs to be explicit.

from the KEP:

The kubeadm configuration options for advertiseAddress and podSubnet will need to be changed to handle a comma-separated list of CIDRs:

ideally, this has to be discussed during a kubeadm office hours meeting before jumping into PRs.
https://docs.google.com/document/d/130_kiXjG7graFNSnIAgtMS1G8zPDwpkshgfRYS0nggo/edit

overall dual stack is out-of-scope for the kubeadm team for this cycle, because we are overbooked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it is too big to define in one KEP.
The KEP does not prioritize one family over another. as a matter of fact users will be able to define ip cidrs in any order they like. This means i can doipv6,ipv4 or ipv4,ipv6. for kubeadm it can still work provisioning a single family clusters even when this change is PRed. But for dual stack some options and behaviors will have to change. These changes are unlikely to be needed in 1.14 or 1.15.

I am happy to jump in and discuss in kubeadm office hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neolit123. The current plan is that once we have support for dual-stack in core Kubernetes with the appropriate documentation we will focus on helping the tooling ecosystem get onboard. In the case of kubeadm this might be a little different if it's absolutely necessary for the purposes of testing. We will be sure to show up at the office hours and determine the best course of action in close consult with the team that works on kubeadm.

- CNI network plugins: Some plugins other than the Bridge, PTP, and Host-Local IPAM plugins may support Kubernetes dual stack, but the development and testing of dual stack support for these other plugins is considered outside of the scope of this proposal.
- Multiple IPs vs. Dual-Stack: Code changes will be done in a way to facilitate future expansion to more general multiple-IPs-per-pod and multiple-IPs-per-node support. However, this initial release will impose "dual-stack-centric" IP address limits as follows:
- Pod addresses: 1 IPv4 address and 1 IPv6 addresses per pod maximum
- Node addresses: 1 IPv4 address and 1 IPv6 addresses per pod maximum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"per node maximum"?

- NodePort: Support listening on both IPv4 and IPv6 addresses
- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, the default behavior will not change and an additional optional field would be added to the pod specification and is respected by kubelet. This will allow application developers to select a preferred IP family to use for implementing probes on dual-stack pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's gotta a better way to do this than asking app devs to specify an IP family (even if there is a default). What will that default be anyway, v4?

Can we just health-check both v4 and v6 and if at least one responds call the pod healthy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that IPv4 is the default if you look at the code of the different components #808 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that forcing end-users to do something here should be the exception.The default should probably be to probe both, but that would not be backwards compatible. Probing and accepting either is probably max compatible. For users who need both, a field might be OK

### Awareness of Multiple IPs per Pod

Since Kubernetes Version 1.9, Kubernetes users have had the capability to use dual-stack-capable CNI network plugins (e.g. Bridge + Host Local, Calico, etc.), using the
[0.3.1 version of the CNI Networking Plugin API](https://github.com/containernetworking/cni/blob/spec-v0.3.1/SPEC.md), to configure multiple IPv4/IPv6 addresses on pods. However, Kubernetes currently captures and uses only IP address from the pod's main interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair it's really the CRI that does this, not Kubernetes itself. So dockershim/CRI-O/etc are the things that need to transmit the multiple IPs back to kubelet via the CRI API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since dockershim is in-tree, the feature is still depending on CRI/CNI to support multiple IPs.

Copy link
Member

@aojea aojea Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the goal is to move dockershim out-of-tree in 1.18 #866, @dcbw do you think that this KEP can be affected by the deprecation plan https://github.com/kubernetes/enhancements/pull/866/files#diff-8e35fdc72dca61e8e5f9c0ebe13ffd33R95 ?

IP string `json:"ip" protobuf:"bytes,1,opt,name=ip"`
// The IPs for this endpoint. The zeroth element (IPs[0] must match
// the default value set in the IP field)
IPs []string `json:"ips" protobuf:"bytes,5,opt,name=ips"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have the same kind of "IP + metadata" that we have for PodIPs? Seems like that would be useful here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metadata option for PodIPs has a use case

refer to the "Properties" map below) will not be used by the dual-stack feature, but is added as a placeholder for future enhancements, e.g. to allow CNI network plugins to indicate to which physical network that an IP is associated

I can´t envision any use case for a metadata field in this case with IP endpoints but I think that we should add an example if we want to add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcbw - do you have concrete use cases? Remember that most endpoints objects are auto-created by the controller


### kube-proxy Operation

Kube-proxy will be modified to drive iptables and ip6tables in parallel. This will require the implementation of a second "proxier" interface in the Kube-Proxy server in order to modify and track changes to both tables. This is required in order to allow exposing services via both IPv4 and IPv6, e.g. using Kubernetes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the "second proxier interface"? Would it be cleaner to just build both v4 and v6 iptables rules while processing each ServicePort and then do two restores at the same time (one for v4 and one for v6) when the service map is done being processed?

Copy link

@uablrek uablrek Apr 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Two proxier instances" would be a better description. The proxier is already instantiated either for ipv4 or ipv6. In dual-stack the cleanest way is to instantiate both. Perhaps with a "meta proxier" in between hiding the instances and calls one or the other or both depending on setup. As a side remark, the ipv4 and ipv6 operations are disjunkt and can be executed simultaneously.

If not having two proxier instances you would get something like;

  if ipv4Enabled {
    // Setup ipv4 stuff
  }
  if ipv6Enabled {
    // Setup ipv6 stuff
  }

all over the place, a construct that should be avoided. @dcbw I don't think this is what you suggest but just to clearify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think dual-stack kube-proxy can push to phse 2 (or 1.5 -- after all the work and testing for phase 1)

@khenidak
Copy link
Contributor

khenidak commented Apr 17, 2019

couple of things to add here

  • Internal to kubernetes types: Section "Awareness of Multiple IPs per Pod" shades a little light on an interesting problem. So let me expand: CNI provides list of ips and their versions. Kubernetes just choose to ignore this and use a single IP. This means this struct Will need to follow the pod.Pod ip style of primary IP as is, and another slice of IPs, having Pod.IPs[0] == Pod.IP to look like the following
type PodNetworkStatus struct {
	metav1.TypeMeta `json:",inline"`

	// IP is the primary ipv4/ipv6 address of the pod. Among other things it is the address that -
	//   - kube expects to be reachable across the cluster
	//   - service endpoints are constructed with
	//   - will be reported in the PodStatus.PodIP field (will override the IP reported by docker)
	IP net.IP `json:"ip" description:"Primary IP address of the pod"`
        IPs  []net.IP `json:"ips" description:"list of ips"`
}

CNI as of today does not provide additional metadata to the IP. So this Properties field - speced in this KEP - will be empty until CNI spec includes properties. Although in theory we can copy the interface name, gateway etc. into this Properties map.

  • Required changes to CRI: The PLEG loop + status manager of kubelet makes an extensive use of PodSandboxStatus call to wire up PodIP to api server, as a patch call to Pod Resources. The problem with this is the response message wraps PodSandboxNetworkStatus struct . And this struct only carries one IP. And will have to change similar to the change described above. I am in the process of reaching out CRI folks to start the discussion to become
type PodSandboxNetworkStatus struct {
	// IP address of the PodSandbox.
	Ip string `protobuf:"bytes,1,opt,name=ip,proto3" json:"ip,omitempty"`
        Ips []string `protobuf:"bytes,1,opt,name=ip,proto3" json:"ip,omitempty"`
}

@thockin @lachie83

Also @justaugustus who wanted to know about this KEP

@feiskyer
Copy link
Member

Required changes to CRI: The PLEG loop + status manager of kubelet makes an extensive use of PodSandboxStatus call to wire up PodIP to api server, as a patch call to Pod Resources. The problem with this is the response message wraps PodSandboxNetworkStatus struct . And this struct only carries one IP. And will have to change similar to the change described above. I am in the process of reaching out CRI folks to start the discussion to become

Looks good. CRI should be extended to support multiple IPs. Meanwhile, for in-tree dockershim, CNI should be upgraded as well.

Add CRI proposed changes
@lachie83
Copy link
Member Author

Added changes as requested @khenidak

- CNI network plugins: Some plugins other than the Bridge, PTP, and Host-Local IPAM plugins may support Kubernetes dual stack, but the development and testing of dual stack support for these other plugins is considered outside of the scope of this proposal.
- Multiple IPs vs. Dual-Stack: Code changes will be done in a way to facilitate future expansion to more general multiple-IPs-per-pod and multiple-IPs-per-node support. However, this initial release will impose "dual-stack-centric" IP address limits as follows:
- Pod addresses: 1 IPv4 address and 1 IPv6 addresses per pod maximum
- Node addresses: 1 IPv4 address and 1 IPv6 addresses per pod maximum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per node not pod ..

The kubeadm configuration options for advertiseAddress and podSubnet will need to be changed to handle a comma-separated list of CIDRs:
```
api:
advertiseAddress: "fd00:90::2,10.90.0.2" [Multiple IP CIDRs, comma separated list of CIDRs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it is too big to define in one KEP.
The KEP does not prioritize one family over another. as a matter of fact users will be able to define ip cidrs in any order they like. This means i can doipv6,ipv4 or ipv4,ipv6. for kubeadm it can still work provisioning a single family clusters even when this change is PRed. But for dual stack some options and behaviors will have to change. These changes are unlikely to be needed in 1.14 or 1.15.

I am happy to jump in and discuss in kubeadm office hours.


CNI as of today does not provide additional metadata to the IP. So this Properties field - speced in this KEP - will be empty until CNI spec includes properties. Although in theory we can copy the interface name, gateway etc. into this Properties map.

### Required changes to Container Runtime Interface (CRI)
Copy link
Member

@Random-Liu Random-Liu Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to update UpdateRuntimeConfig to support dual stack. For some CNI plugins, e.g. kubenet, we use that function to pass in the node CIDR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all my comments are about sequencing -- leaving anything kube-proxy related for phase 2 (or 1.5, stritcly behind phase 1) will hopefully allow us to focus on the API semantics and make sure we get good testing of those things without muddying the waters.

Thoughts? I'd love to approve this, but need consensus on that part of the plane.

- For simplicity, only a single family of service IPs per cluster will be supported (i.e. service IPs are either all IPv4 or all IPv6).
- Backend pods for a service can be dual stack.
- Endpoints for a dual-stack backend pod will be represented as a dual-stack address pair (i.e. 1 IPv4/IPv6 endpoint per backend pod, rather than 2 single-family endpoints per backend pod)
- Kube-proxy iptables mode needs to drive iptables and ip6tables in parallel. This is required, even though service IP support is single-family, so that Kubernetes services can be exposed to clients external to the cluster via both IPv4 and IPv6. Support includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #648 (comment) We talked about excluding this from phase 1.

Is there a reason it's back in? I ACK that a lot of the functionality of dual-stack is deferred by not including NodePorts and External IPs, and I am not AGAINST adding them in phase 1 if it turns out to be feasible. I just don't think we should block pod IPs, hostPorts, node IPs, etc from supporting v6 for them. And I don't think we should prioritize them or muddle the careful review and testing that will be required for phase 1.

It seems OK to add endpoints support to what I described as step1, so headless services work.

Can we consider NodePorts, external IPs, LB IPs and Ingress IPs (basically all the things that require dual-stack kube-proxy) as phase 2, with an option to pull them in early if feasible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've updated the implementation plan to explicitly call out this as part of phase 2.

- NodePort: Support listening on both IPv4 and IPv6 addresses
- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, the default behavior will not change and an additional optional field would be added to the pod specification and is respected by kubelet. This will allow application developers to select a preferred IP family to use for implementing probes on dual-stack pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that forcing end-users to do something here should be the exception.The default should probably be to probe both, but that would not be backwards compatible. Probing and accepting either is probably max compatible. For users who need both, a field might be OK

- ExternalIPs: Can be IPv4 or IPv6
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
- For health/liveness/readiness probe support, the default behavior will not change and an additional optional field would be added to the pod specification and is respected by kubelet. This will allow application developers to select a preferred IP family to use for implementing probes on dual-stack pods.
- The pod status API changes will include a per-IP string map for arbitrary annotations, as a placeholder for future Kubernetes enhancements. This mapping is not required for this dual-stack design, but will allow future annotations, e.g. allowing a CNI network plugin to indicate to which network a given IP address applies. The approriate hooks will be provided to enable CRI/CNI to provide these details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to add the annotations field if we don't have a concrete case. Just knowing it CAn be added is what maters

##### V1 to Core (Internal) Conversion
- If only V1 PodIP is provided:
- Copy V1 PodIP to core PodIPs[0]
- Else if only V1 PodIPs[] is provided: # Undetermined as to whether this can actually happen (@thockin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the versioned defaulting works, it should not be possible, but good to be safe about it and spec it

Only the first address of each IP family will be used; all others will be logged and ignored.

#### kube-proxy Startup Configuration for Dual-Stack Pod CIDRs
The existing "cluster-cidr" option for the [kube-proxy startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) will be modified to support multiple cluster CIDRs in a comma-separated list (rather than a single IP string), i.e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think we should move this to phase 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-proxy and kubelet prober will work with the default IP as-is at the end of phase 1. Updated implementation plan to reflect that changes to both kube-proxy and kubelet prober will be actioned in phase 2.


### kube-proxy Operation

Kube-proxy will be modified to drive iptables and ip6tables in parallel. This will require the implementation of a second "proxier" interface in the Kube-Proxy server in order to modify and track changes to both tables. This is required in order to allow exposing services via both IPv4 and IPv6, e.g. using Kubernetes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think dual-stack kube-proxy can push to phse 2 (or 1.5 -- after all the work and testing for phase 1)

#### Kube-Proxy Startup Configuration Changes

##### Multiple bind addresses configuration
The existing "--bind-address" option for the [kube-proxy startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) will be modified to support multiple IP addresses in a comma-separated list (rather than a single IP string).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several components have flags that are similar and probably all need to be updated


- Because service IPs will remain single-family, pods will continue to access the CoreDNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR.
- Non-headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR.
- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's endpointFamily configuration (see [Configuration of Endpoint IP Family in Service Definitions](#configuration-of-endpoint-ip-family-in-service-definitions)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phase 2

The [NGINX ingress controller](https://github.com/kubernetes/ingress-nginx/blob/master/README.md#nginx-ingress-controller) should provide dual-stack external access to Kubernetes services that are hosted on baremetal clusters, with little or no changes.

- Dual-stack external access to NGINX ingress controllers is not supported with GCE/GKE or AWS cloud platforms.
- NGINX ingress controller needs to be run on a pod with dual-stack external access.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ACK the value of this, but I think it is phase 1.5 or 2


Pod information is exposed through environmental variables on the pod. There are a few environmental variables that are automatically created, and some need to be specified in the pod definition, through the downward api.

The Downward API [status.podIP](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#capabilities-of-the-downward-api) will preserve the existing single IP address, and will be set to the default IP for each pod. A new environmental variable named status.podIPs will contain a comma-separated list of IP addresses. The new pod API will have a slice of structures for the additional IP addresses. Kubelet will translate the pod structures and return podIPs as a comma-delimited string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comma-seperated rather than space seperated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need access to other fields in the PodIPInfo struct? Maybe the field selector here should be "status.podIPs[].ip" or even "[*]" ? Not sure we have any precedent here

Copy link
Member Author

@lachie83 lachie83 May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect that users that would need this data would be advanced users and would advise going to the API server to get this information. Environment variables doesn't seem to the correct place for this dictionary based data for each IP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to lock down the API for this KEP, but when we get to this part of details, I'd like to discuss. I don't want to make incomplete or partial designs when we don't have to, and I don't want to "make up" new syntax where we have existing mechanisms to describe things :)


* Build and add e2e tests to test-grid.
* e2e tests should cover the following:
* multi-network, same family
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is considered "multi-network"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to multi-IPs

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

- Multi-family services including kube-proxy
- Working with a CNI provider to enable dual-stack support
- Change kubelet prober to support multi-address
- Update component flags to support multiple `--bind-address`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be either phase, I think - seems easy enough but not critical.

\<TBD\>

## Graduation Criteria
\<TBD\>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve this but we should start thinking about this, too. Specifically:

  • do we want to GA the multi-IP suppot (phase 1) independent of the dual-stack services work?
  • what criteria will we be monitoring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having dual stack on pod ip is useful for those offering ingress as ipv4 and doing downstream server calls to ipv6 (or the otherway around). I can not speak to the volume of users. I think as we release phase 1 to alpha we will hear from users and we can adjust accordingly.

@lachie83

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 3, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lachie83, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit e54b1b7 into kubernetes:master May 3, 2019
### Non-Goals

- Service CIDRs: Dual-stack service CIDRs will not be supported for this proposal. Service access within a cluster will be done via all IPv4 service IPs or all IPv6 service IPs.
- Single-Family Applications: There may be some some clients or applications that only work with (bind to) IPv4 or or only work with (bind to) IPv6. A cluster can support either IPv4-only applications or IPv6-only applications (not both), depending upon the cluster CIDR's IP family. For example, if a cluster uses an IPv6 service CIDR, then IPv6-only applications will work fine, but IPv4-only applications in that cluster will not have IPv4 service IPs (and corresponding DNS A records) with which to access Kubernetes services. If a cluster needs to support legacy IPv4-only applications, but not IPv6-only applications, then the cluster should be configured with an IPv4 service CIDR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"There may be some some clients or applications that only work with (bind to) IPv4 or or only work with (bind to) IPv6. "----small syntaxt error in this sentence


#### Kubeadm-Generated Manifests

Kubeadm will need to generate dual-stack CIDRs for the --service-cluster-ip-range command line argument in kube-apiserver.yaml:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the current goal is have single-family services, I don't think we need to support a list of CIDRs in --service-cluster-ip-range
Further down in this KEP, the variation of Dual-Stack Service CIDRs is discussed but that's a non-goal. There this change would be applicable and has already been noted.

@s1061123
Copy link

s1061123 commented Jun 6, 2019

Q about IPv6 K8s Service/Endpoints. Does Service/Endpoints allows users to add Anycast IPv6 address?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IPv4/IPv6 dual-stack support