-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New KEP for SCTP support feature #2276
Conversation
/assign @thockin |
- "@janosi" | ||
owning-sig: sig-network | ||
participating-sigs: | ||
- sig-cloud-provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least ALSO sig-network, not sure cloud-provider will matter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it is not enough then that sig-network is the owning-sig.
reviewers: | ||
- "@thockin" | ||
approvers: | ||
- TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me also
- TBD | ||
editor: TBD | ||
creation-date: 2018-06-14 | ||
last-updated: yyyy-mm-dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start with creation date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "creation-date" should be the date when I pushed the doc to github first time?
|
||
# SCTP support | ||
<!--- | ||
This is the title of the KEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please delete the template
|
||
### Non-Goals | ||
|
||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know of any that work? We should probably default to a validation error if this combination is specified.
How about NodePort ? If LB is disabled, maybe NodePort should be too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I know, OpenStack supports SCTP as LB protocol. Though, of course we can start with the approach, that for type=LoadBalancer Protocol=SCTP is not allowed, and whenever there is a need a new PR can enable that for the relevant cloud provider.
UPDATE: I checked the OpenStack LBaaS API (again), and SCTP is not supported on the API. So, I remove my OpenStack related modification and I make sure that SCTP becomes a restricted protocol for the OpenStack case, too. Also it means, that for the time being we can restrict the usage of SCTP with type=LoadBalancer, as you proposed.
I will check the NodePort use case.
The Kubernetes API modification for Services is obvious. | ||
The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. | ||
For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to talk about HostPorts and demonstrate that we can make it work
#### SCTP in NetworkPolicy | ||
The Kubernetes API modification for the NetworkPolicy is obvious. | ||
In order to utilize the new protocol value the network controller must support it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to talk about DNS and list the changes needed there (at least SRV logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would at least need to use _sctp for SRV. You should probably explicitly state the expectations around the multi-homing functionality. I assume that will not be supported? If it were supported, I think we would want something special around the service discovery to differentiate between IPs that are part of the same multi-homed endpoint vs. part of a separate sctp peer. My knowledge of sctp is pretty limited but reading RFC 4960 section 5.1.2 it looks like an individual name should resolve to a set of addresses that can be used for multi-homing. This is really different from what we do with SRV (an multiple A records) today, which resolve to separate endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement SCTP multi-homing in K8s, first of all NAT should support SCTP. There is an IETF draft about it: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-natsupp/
#### Interworking with applications that use a user space SCTP stack | ||
A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. There are some ideas how to solve this interworking problem: | ||
|
||
1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not working with IPVS seems like a pretty dire problem. With the "port ranges" proposal, one idea was to forward whole IPs, identity mapped. Does that help here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are the following idea alternatives, that may work with both iptables and ipvs:
- Configure simple IP forwarding for the ClusterIP on host level: ClusterIP -> localhost. The userspace proxy would bind on localhost + the service specific port.
- Every ClusterIP is configured to the loopback interface of the host. The userspace proxy binds on the ClusterIP+the service specific port.
A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. There are some ideas how to solve this interworking problem: | ||
|
||
1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. | ||
2. Fall back to the user space proxy on those specific nodes. The user space proxy shall also use a user space SCTP stack, of course. Also the iptables rules that direct the client traffic to the userspace proxy must be created without the "-p sctp" option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
userspace is a dead-end. It's not likely to get extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"shall also use a user space SCTP stack, of course"? Ah... is it not possible to implement a generic SCTP proxy via the sockets API?
If the userspace proxy needs to use the userspace SCTP stack then that means the "tainting" runs in both directions: if you want to run the userspace proxy on a node for whatever reason, then you can't run containers that expect to use kernel SCTP on that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course if you don't use kernel SCTP support then that kind of kills my earlier claim that NetworkPolicy would be easy to implement because the kernel does all the hard stuff. Plugins with iptables-, OVS-, or eBPF-based NetworkPolicy presumably would not be able to do SCTP at all on nodes without kernel SCTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin Indeed, a user space proxy does not sound good, but on a node where a userspace SCTP app runs we cannot use the SCTP kernel module anyway. So, if we want to have a proxy on those nodes, then we have to implement the SCTP stack for that proxy in the userspace.
@danwinship Exactly, you see it right. On a node where a userspace SCTP application runs we cannot run such applications that would require the SCTP kernel module. And it is true on the other way around, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this part again: it will not work. In order to route a request to e.g. a userspace proxy the port on which the userspace proxy listens must be configured in iptables/ipvs. And it means that the protocol must be configured, that loads the kernel module.
~~Currently I think, there is no nice way to solve the interworking of userspace SCTP stacks and kernel module based services in the same k8s cluster. ~~
UPDATE: we have an idea, let's see what do you think. I update the document accordingly.
In any case we shall be able to dedicate these nodes for those userspace SCTP applications, or at least, we must achieve that "regular" SCTP user applications are not deployed on these nodes. The solution proposal for this node separation: | ||
|
||
- there shall be a new kube-proxy parameter. If the parameter is set, the kube-proxy switches to this new mode of operation (described above) for SCTP services | ||
- if the new kube-proxy parameter is set the node must be tainted with a new taint, so the scheduler places only such SCTP applications on this node that use userspace SCTP stack. We must avoid the deployment of "regular" SCTP users on this node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really niche requirement (1 request in 4 years). I am not in favor of adding complexity like this. Can we make this the cluster-admin's problem? If you need SCTP, arrange taints or node labels..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but what we have to achieve is, that the proxy obeys those taints/labels, so the proxy does not execute any workflow which would load the SCTP kernel module on the node.
Another option is, that the whole SCTP support (as it is now in the code) is put behind a cluster level parameter - if SCTP support is allowed, the cluster uses the kernel modules everywhere. If not allowed, the cluster works as without this enhancement, and userspace SCTP stacks can work as they were before.
Or, back to the drawing table, and we really allow SCTP only for headless services ;) -> however, NetworkPolicy engines can still make things problematic for the userspace stacks.
<!--- | ||
This is where we get down to the nitty gritty of what the proposal actually is. | ||
---> | ||
### User Stories [optional] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious to learn the user stories of having SCTP support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean, why SCTP support is requested at all?
@janosi So do you (or anyone else here) actually want to do SCTP over the "normal" k8s network? (What the multi-network group is calling the "Kubernetes Cluster-Wide Default Network"). Or are you only interested in SCTP over physical NICs / SR-IOV / DPDK / etc? From what I've been able to gather, the telecom use cases for SCTP also have bandwidth/latency constraints such that you wouldn't want the traffic going over any sort of SDN. Is that accurate? (Or if that's not accurate in general, is it at least accurate for the use cases that require userspace SCTP?) |
@danwinship Our original purpose was to have SCTP based endpoints/pods in the kube-dns. That was the reason why we initiated the relevant k8s issue. |
@janosi I appreciate that you consider my comment about user-land SCTP as a requirement! |
Yeah, I know. My questions were trying to figure out if the userspace SCTP use case needs to involve "kubernetes networking", or if it just involves "kubernetes" and "networking" but not "kubernetes networking" (and DPDK was an example of something that isn't "kubernetes networking"). I think the userspace SCTP case creates a lot of problems, and if you don't personally care about it, you might want to just drop that stuff from this proposal, and only worry about (a) not implementing things in a way that would make it harder to add userspace SCTP support later, and (b) not implementing things in a way that would break people who were previously successfully using userspace SCTP behind kubernetes's back. Eg, for the latter, we could make sure that kubernetes itself does not create any |
|
||
#### Headless Service with SCTP | ||
As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery method that gets information about endpoints via the Kubernetes API. | ||
Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give the example records that would be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example now. Please check and comment whether that fits for you.
port: 7777 | ||
``` | ||
#### User space SCTP stack | ||
As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this could be separated out from the KEP which is mostly k8s API focused, unless there is some sort of interaction...
The Kubernetes API modification for the NetworkPolicy is obvious. | ||
In order to utilize the new protocol value the network controller must support it. | ||
|
||
#### Interworking with applications that use a user space SCTP stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a separate KEP ? to enable use of user-space protocol stacks + kube-proxy.
@bowei @MaximProshin @danwinship I added alternative proposals to solve the userspace SCTP stack interworking. Also I added the explanation why the userspace and kernel space SCTP stack cannot work together currently. Please check those. |
@janosi Thanks for analysing the interworing issue and pointing out the above alternatives.
Another way is as I mentioned in kubernetes/kubernetes#64973 (comment) introduce an option for service itself alowing user to decide if creation of sctp socket is needed. |
@KomorkinMikhail Thank yo for checking the document! |
/ok-to-test |
@janosi Actually no, at least for iptables I can guarantee that. We have included SCTP support locally, we did the same code changes as you but when it comes to openLocalPort() invocation we don't do this for SCTP type of services. Then proxy goes straight to rules adding, as a result LKSCTP is not loaded (it loads up exactly at the moment of socket creation/bind/listen procedure). Simply adding lines into iptables chains has nothing to do with loading up kernel modules. Could you please recheck it on your side? Perhaps LKSCTP has been already loaded when you tried this scenario? |
@KomorkinMikhail Hmmmm, yes, you are right. My shame :( I do not remember why I became this sure about the loading of the lksctp module when iptables rules are defined. :( I was wrong. |
@janosi Great! |
@KomorkinMikhail Yes, in some cases an ABORT is necessary, and on iptables level we cannot really tell why the ABORT was initiated. |
@janosi and @KomorkinMikhail This is a good progress! Thanks! |
I dislike it but if that is the only thing preventing it from working, I could probably forego it. But we should only be opening a port if teh Service is type=NodePort or LoadBalancer, which we already said we won't do, right? |
@KomorkinMikhail Indeed, and you can find my analysis in the PR. LB providers more or less covered the requirement, that they should filter protocols before the request go to the actual cloud's LB controller. Though, as I understood, @thockin asked for blocking the Service creation with type=LB at validation. |
@janosi Yes, I get it, but why do we need to block it on k8s level if cloud providers can handle the check on their own? In case cloud provider can handle SCTP protocol k8s should accept the service |
@KomorkinMikhail I think we have to turn to @thockin for the answer. This was his suggestion: "Do we know of any that work? We should probably default to a validation error if this combination is specified." |
@janosi @thockin
K8s can be used in many other Cloud providers in addition to supported ones. And we are also looking into posibility of some loadbalancer options in future. |
@KomorkinMikhail @thockin I am fine with any of those options. |
Can anyone please suggest how to move this KEP forward? Will there be a sig-network meeting next week? |
@janosi: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janosi yes, there is a sig-network meeting next week, Thursday the 23rd.
|
||
It is not a goal to support SCTP as protocol value for the container's HostPort. The reason: [the usage of HostPort is not recommended by Kubernetes][], and to ensure proper interworking of HostPort with userspace SCTP stacks (see below) would require an additional kubelet/kubenet configuration option. In order to keep the complexity and impact of the introduction of SCTP on a lower level we do not plan to support SCTP as new protocol value for HostPort. | ||
|
||
[the usage of HostPort is not recommended by Kubernetes]:https://kubernetes.io/docs/concepts/configuration/overview/#services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't say that use of HostPorts is not recommended. It just points out that you shouldn't gratuitously add HostPorts to pods that don't actually need them, because host ports are a limited resource.
(So I think we should support SCTP HostPorts, unless there was some other argument against it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why not, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The reason I proposed this way was, that listening on HostPorts (in order to reserve the port) is not handled by the kube-proxy but by the kubelet (kubelet -> Docker Service -> Kubenet plugin -> hostportmanager). That is, in order to have the same "userspace sctp config" based listening on HostPorts it would require another config parameter for another component - i.e. the user shall configure the userspace sctp support in 2 different places. And because I understood the cited k8s documentation so, that the usage of HostPorts is not recommended, I thought that then I can skip the support of SCTP for HostPorts. But of course, if the decision is to have SCTP support for HostPorts, too, then I do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In truth, I am not sure how valuable kubelet's logic is here. ISTR there was/is a bug in it that it doesn't re-open ports if Kubelet restarts, and nobody has complained about it.
Maybe we should just remove that logic altogether? We could start by skipping it for SCTP and have a separate discussion about removal... Just make sure it is well commented why we do it for everything except SCTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me.
role: myservice | ||
policyTypes: | ||
- Ingress | ||
- Egress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FWIW, this NetworkPolicy object blocks all egress. You probably want to remove the line "- Egress
" here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
- protocol: SCTP | ||
port: 7777 | ||
``` | ||
#### Userspace SCTP stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worthwhile to explain why this really is a valid use case. (Because it's not possible to write a server that proxies/filters arbitrary SCTP streams using the sockets APIs and kernel SCTP.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right. For me the justification came from the fact that there are apps that use userspace SCTP for any reason right now, and most probably they do not want to re-write their network handling logic at the same time when they move to containers+k8s - so they must have userspace SCTP support.
|
||
SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes. | ||
|
||
### Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting SCTP connections from clients outside the cluster should be listed as either a Goal or a Non-Goal. (And if the former, should be demonstrated by a User Story.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, allowing pods to make outgoing SCTP connections to servers outside the cluster should be either a Goal or a Non-Goal. (In particular, if it's a Goal, then plugins need to set up SCTP NAT or whatever, which would conflict with the "Documentation only" approach to allowing userspace SCTP.)
|
||
### Non-Goals | ||
|
||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to not require cloud providers to add SCTP support, but I don't see any reason to forbid SCTP load-balancing, especially given that some providers apparently do already support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can hold my nose on this. I really dislike things that are quietly non-portable, like this, but it's hardly the first or worst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably my position on SCTP LoadBalancers here contradicts my position on NetworkPolicy ipBlock ingress on the list... (Although I feel like ipBlock ingress is more broken than SCTP LoadBalancing is so it's not too inconsistent.) Having stronger guidelines for situations like this would be good... I think sig-network outsources more behavior to plugins and third parties than most other SIGs do so this is probably something that affects us more than everyone else.
|
||
In case of Servies with ClusterIP or NodePort or externalIP the selected port shall be reserved on the respective nodes, just like for TCP and UDP currently. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to reserving those ports via the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. | ||
|
||
For Services with type=LoadBalancer we reject the Service creation request for SCTP services at API validation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As before, this seems wrong.
#### SCTP in NetworkPolicy | ||
The Kubernetes API modification for the NetworkPolicy is obvious. | ||
|
||
In order to utilize the new protocol value the network controller must support it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network plugin
|
||
As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. | ||
|
||
The real challenge here is to ensure that when an SCTP Service is created in a Kubernetes cluster the Kubernetes logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me.
Is this proposal saying that Kubelet and kube-proxy will not do the usual "open and hold" operation for SCTP ports?
Is that sufficient? If I have a cluster with an SCTP service, that triggers loading iptables modules for sctp, but does not listen() on the port - will userspace SCTP be OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this proposal saying that Kubelet and kube-proxy will not do the usual "open and hold" operation for SCTP ports?
Yeah, on nodes that were marked userspace-sctp. Which would mean that HostPort/ExternalIP SCTP services wouldn't work on userspace-sctp nodes, and NodePort SCTP services would not be reachable on userspace-sctp nodes.
But that weirdness would be limited to clusters that used userspace sctp; in clusters with no userspace sctp nodes, SCTP would behave just like everything else.
If I have a cluster with an SCTP service, that triggers loading iptables modules for sctp, but does not listen() on the port - will userspace SCTP be OK?
It looks like the kernel has some code to parse SCTP packets even without the sctp
module being loaded, and you can add rules with matches like -p sctp --dport 80
without needing the kernel sctp module. (It will cause xt_sctp
to be loaded, but that's just part of iptables, and wouldn't interfere with userspace SCTP processing.)
It is possible that some iptables extension rules (NAT?) might cause sctp
to be loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very excited by formalizing userspace SCTP as a concept in Kubernetes based on labels. In general, we don't want to use annotations or labels for config, when we have real config (even if kube-proxy is behind the curve wrt component config).
We could add a flag/component-config param here, and leave labels to users. I would be OK with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt SCTP NAT: according to our observations, iptables based NAT does not cause any harm. As @KomorkinMikhail wrote above in this PR, when they implemented SCTP related patches into k8s they simply skipped those "openlocalport" parts for SCTP, and they could use it with NodePort and ExternalIP as well.
Wrt the separation of nodes: my intention with that description was not to tell, that we can/should solve it with labels. I just wanted to say, that on app level it is easy to direct the app pods to the right nodes with the usual label based mechanism. The current implementation in the code PR introduces a new parameter in kube-proxy config ("SCTPUserSpaceNode") for the purpose of skipping the listening on the SCTP ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This wasn't clear. The main points, if I understand:
- skip the opening/holding of the port in both kubelet and kube-proxy IFF SCTP
- suggest that user segregate their user-space SCTP work and their kernel-space SCTP work by node labels, but the details of that are left to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp, my English.... :/
Exactly, that was the plan.
…, we cannot use the port value in iptables rules if the protocol is not defined
…worked based on further investigations.
…the SCTP kernel module. The proposed solution is updated accordingly, and it became a lot simpler. Also the usage of SCTP as protocol value in the Pod/container descriptor is described in the document now.
…rt with SCTP is clarified (not supported).
… HostPort with SCTP shall be supported; type=LoadBalancer with SCTP shall be supported
303d38b
to
c234ca8
Compare
@thockin @danwinship I updated the document according to the comments above and the discussion in the SIG Network meeting yesterday. Please check the document, and if it is OK, please move it forward into a next state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks A few notes for progress into implementation.
|
||
### Non-Goals | ||
|
||
It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. The Kubernetes side implementation will not restrict the usage of SCTP as the protocol for the Services with type=LoadBalancer, but we do not implement the support of SCTP into the cloud specific load balancer implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bowei We should queue this up for discussion in GCP
@andrewsykim @jagosan Can we put this on an agenda for sig-cloudprovider to make sure people know it is coming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! Thanks!
- 80.11.12.10 | ||
``` | ||
|
||
#### NetworkPolicy with SCTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the implementation of this, I'll ask you to reach out to NetworkPolicy implementors and at least let them know it is coming. I might not hold alpha for it, but I will hold beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do so.
|
||
#### SCTP in Services | ||
##### Kubernetes API modification | ||
The Kubernetes API modification for Services to support SCTP is obvious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB we will have to filter/reject it during alpha
For Services with type=LoadBalancer we expect that the cloud provider's load balancer API client in Kubernetes rejects the requests with unsupported protocol. | ||
|
||
#### SCTP support in Kube DNS | ||
Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. I.e. there is no need for additional implementation to achieve this goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cross-check CoreDNS support and track it as part of alpha.
/lgtm OK to move to implementable. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I am interested in joining this effort and can take the service/kube-proxy implementation for SCTP if no one is working on this ... |
KEP document for SCTP support.
Relevant PR: kubernetes/kubernetes#64973