-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EKS]: Next Generation AWS VPC CNI Plugin #398
Comments
Does IPv6 support feature in the new design? I'd like to be able to run a dual stack network, assigning both an IPv4 and an IPv6 address to each pod. In this configuration, the behaviour of Kubernetes is that it will use the IPv4 address for things like service endpoints but it would allow pods to connect to external IPv6 sites.
|
Definitely looking forward to this feature; there's plenty of uses for it. |
@gregoryfranklin yes. While we are not currently planning to support IPv6 in the initial release, we believe this design is extensible and will allow us to support IPv6 in the future. Interested in learning more about the need for dual stack, I think this is a separate networking mode that we will need to consider. |
Dual stack is a migration path to IPv6-only. We have several EKS clusters connected to a larger internal network via direct connects (hybrid cloud). IP address space is something we are having to start thinking about. Its not an immediate problem, but will be in the next few years, which means we are having to think about migration paths now. For ingress, traffic comes through an ELB which can take inbound IPv6 traffic and connect to an IPv4 backend pod. However, for egress the pods need to have an IPv6 address to connect to IPv6 services (in addition to an IPv4 address to connect to IPv4 services). Dual stack pods would allow us to run parts of the internal network IPv6-only. For example, a webapp running in EKS could use an IPv6 database in our own datacentres. Being able to expose our apps to IPv6 traffic is an important step in identifying and fixing IPv6 bugs in our code and infrastructure (of which we have many). Also it stops developers from introducing new IPv6 bugs. |
+1 for IPv6 support due to IPv4 exhaustion. Especially when scaling EKS to higher number of nodes additional CIDRs have to be added to the VPC. IPv6 would be a perfect fix for this and would enable easier higher density EKS-clusters. |
Main reason I would want IPv6 is to run a cluster that is IPv6 only. Right now that's not something that Kubernetes itself supports very well; however, things seem to be catching up fast. To handle connections from the public dual-stack internet, you could use Ingress, Proxy Protocol, etc (similar to how a typical cluster today maps from public IPv4 to private IPv4). Possibly a SOCKS or HTTP proxy for outbound traffic too, which would allow access to IPv4-only APIs. |
We are very much enthusiastic about this next-gen plugin that would benefit us greatly:
So to summarize, this proposal is something we are greatly anticipating, and IMHO this sounds much more like a production-ready 1.0 CNI plugin from AWS compared to the previous one (that sadly doesn't really work for us microservice guys). Keep up the good work! |
security goal that'd be useful: no matter the mode (i.e. ENI trunking or secondary IP approach) or user configuration (e.g. lack of Network Policies through, let's say, Calico), the CNI should prevent Pods from accessing the host's metadata endpoint. this is a common issue seen in practice, which results in unintended credential exposure. seems straightforward to solve with an iptables rule at the node when setting up a container's veth pair in https://github.com/aws/amazon-vpc-cni-k8s/blob/6886c6b362e89f17b0ce100f51adec4d05cdcd30/plugins/routed-eni/driver/driver.go (i.e. block traffic to 169.254.169.254 from that veth interface), for the general case. I am not familiar with ECS trunking, so I cannot suggest an approach there. |
note that this rule construction is kube2iam's general approach https://github.com/jtblin/kube2iam#iptables, though it doesn't drop the traffic from the Pod outright, due to its feature set. they use a neat 'glob' I wasn't aware of, so you wouldn't need to create a rule per-veth at creation time (i.e. eni+ to match all after that prefix). |
any update on this @tabern ? |
Is there a repository where the progress of this next-gen cni code can be viewed and tracked? |
Any thoughts around adding support for enforcing network policies to the cni plugin? It would be great if security groups could be used in the ingress/egress rules for network policies. |
We have certain use-cases where we need to expose the pods directly to the public internet, so they need a public IP (WebRTC, STUN/TURN). It would be an awesome feature if the new CNI plugin would be able to assign a public IP or EIP to pods (e.g. when a certain annotation on the pod is given) and also put the assigned IP into some status field or annotation of the pod. Currently we are working around this by using autoscaling groups with node taints ( |
Is it ever going to be possible to use one of these partner CNIs with AdmissionWebhooks? E.g. routable from the API server to the overlay network? |
I appreciate the recent update from @mikestef9, but I still have no sense of what this means in terms of timing. Our org has desperately wanted to switch to EKS for various reasons, but node density and CNI custom networking improvements are must-haves for us. I'm not expecting exact dates, but it feels like these improvements have been in the "coming months" stage for over a year. If these improvements aren't rolled out by say, EOY - it's quite probable we just have to skip our EKS plans altogether. |
@MarcusNoble |
@eightnoteight That only really helps where you're managing the webhooks yourself. We've got a few third-party applications used in our clusters that set up webhooks for themself so we'd need to manually modify the manifests of those applications or in the case where they're created in code at runtime we'd need to fork and update the application. 😞 |
@mikestef9 Will overlay network be option for pod networking or it will as be eni based? |
No, we will not be building any overlay option into the VPC CNI plugin, as that strays quite a bit from the original design goals of the plugin and will add too much complexity. Custom networking is our "overlay-like" option in the VPC CNI, but as I mentioned above, "we also realize that a single CNI plugin is unlikely to meet every possible use case", and added links to our docs that do list alternate CNI plugins with overlay options. We feel the best solution to IPv4 exhaustion is IPv6, and that's where we are investing with the VPC CNI plugin. |
How does Increased Pod Density and Security Groups Per pod interoperate? Will the be compatible with each other? I saw a comment about a limit of 50 enis per node mentioned when it comes to vlan tagging |
@mikestef9 I'm glad you are acknowledging that the VPC CNI cannot meet every possible use case, and I'm grateful for the documentation that has been added on how to install alternate CNIs on EKS. However, all of these alternate CNIs have the limitation that they cannot be installed on to the control plane master nodes, which I am sure you are aware of. This means that things like admission controller webhooks will fail, as well as other things that require a control plane node to communicate with a pod on a worker node. Are there any plans in place to fix this problem to allow 3rd party CNIs to be fully functional? |
Hi @mikestef9 , is there any documentation available to configure POD's using Security Group related Custom resource definition? |
Documentation is published https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html Stay tuned for further updates on #177 |
Thanks, @mikestef9 for sharing |
Nice! Does @mikestef9 have any timeline for security-groups-for-pods on Fargate? It'll be useful to do the migration plan |
You can follow this issue #625 for updates on that feature request. No timeline to share right now. Note that the UX we have in mind there will be the same as the SecurityGroupPolicy CRD for worker nodes, and not something that is added to the Fargate Profile |
@bencompton Sorry for resurrecting an old comment, but I wanted to post a solution to this problem here since no one else has yet. Setting the max number of pods per node is a native kubelet functionality, see --max-pods. The AWS documentation suggests setting this value by passing something like It is not possible to pass the documentation suggested arguments to the bootstrap.sh script with managed worker nodes, however, it is possible to add custom userdata to a launch template that is utilized by your managed worker nodes. There are some requirements for the formatting of the userdata that are not typical, so make sure to familiarize yourself with the specifics here. So to set a custom maxpods value you need to do 2 things:
Here is my userdata which implements these 2 tasks: #!/bin/bash
set -ex
BOOTSTRAP_SH=/etc/eks/bootstrap.sh
BOOTSTRAP_USE_MAX_PODS_SEARCH="USE_MAX_PODS:-true"
KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json
MAX_PODS=20 # put whatever quantity you want here
# set a maxPods value in the KUBELET_CONFIG file
echo "$(jq ".maxPods=$MAX_PODS" $KUBELET_CONFIG)" > $KUBELET_CONFIG
# search for the string to be replaced by sed and return a non-zero exit code if not found. This is used for safety in case the bootstrap.sh
# script gets changed in a way that is no longer compatible with our USE_MAX_PODS replacement command.
grep -q $BOOTSTRAP_USE_MAX_PODS_SEARCH $BOOTSTRAP_SH
# set the default for USE_MAX_PODS to false so that the maxPods value set in KUBELET_CONFIG will be honored
sed -i"" "s/$BOOTSTRAP_USE_MAX_PODS_SEARCH/USE_MAX_PODS:-false/" $BOOTSTRAP_SH This is a workaround that works for now. This is certainly not recommended by AWS, and could break at some point in time depending on updates made to the bootstrap.sh script. So use this method with caution. Eventually this should no longer be needed based upon this comment from @mikestef9 above and #867:
|
This is my current workaround for CNI custom networking with MNG (managed node group) which is dynamic but requires access to IMDS, Custom launch template user dataMIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" |
Hi, Any update when can we have IPv6 support for EKS. Also is there any workaround to have a dual stack support for pods in EKS Regards, |
@mikestef9 Unfortunately this is no longer on the roadmap? |
@davidroth I think it was kinda replaced/broken up into more smaller features - like IPv6 support, higher IP density for pods on nodes, etc etc. So there's no longer going to be an explicit switch to a brand new plugin, more continuous improvements to the existing one :) Edit: it was touched upon here: #398 (comment) |
As the feature is now GA -- see https://aws.amazon.com/jp/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/ for details -- suggest closing this issue. |
@FlorianOtel the last outstanding pain point discussed originally in this issue IPv4 exhaustion. I plan on closing once we launch IPv6 support #835 |
It would be greatly helpful if with regards to the VPC CNI plugin, and especially around windows support, if the documentation and troubleshooting would be updated to cover how you're supposed to debug the new wiring rather than as present covering how to debug the older webhooks/controller version. (if theres a separate repo for documentation, please let me know) Part of this would appear to be working on and completing several aged prs in the CNI repo which help to address the way the CNI setup fails silently / without feedback. |
Closing as we have now released native VPC CNI features to address all of the initial pain points discussed in this issue.
|
Edit 8/3/2020, see below comment for update on the status of this feature. There will not be a single new plugin release, but rather a series of new features on the existing plugin.
We are working on the next version of the Kubernetes networking plugin for AWS. We've gotten a lot of feedback around the need for adding Kubenet and support for other CNI plugins in EKS. This update to the VPC CNI plugin is specifically built to solve the common limitations customers experience today with the AWS VPC CNI version 1 and other Kubernetes CNI plugins.
Notably:
Architecturally, the next generation VPC CNI plugin differs from the existing CNI plugin. The new plugin cleanly separates functionality that was tightly coupled in the existing CNI:
Pod networking (data plane) will continue to be part of the worker nodes, but the management of the networking infrastructure will be decoupled into a separate entity that will most likely run on the Kubernetes control plane. This will allow dynamic configuration of the CNI across a cluster, making it easy to support advanced networking configurations and change the networking configuration of the cluster on a per-service basis without restarting nodes.
These new functional behaviors are all supported while maintaining conformance to the standard Kubernetes network modelrequirements.
We think this CNI design will give customers the power and flexibility to run any workload size or density on a Kubernetes cluster using a single CNI plugin. We plan to implement this as the standard CNI for Amazon EKS, and release it as an open source project so that anyone running Kubernetes on AWS can utilize it.
The advantage of this approach is that it supports multiple networking modes and allows you to use them on the same cluster at the same time. We think these will be:
You will be able to change which networking mode is used for pods on any given node and adjust the CIDR blocks used to assign IPs at any time. Additionally, the same VPC CNI plugin will work on both Linux and Windows nodes.
We're currently in the design and development stage of this plugin. We plan to release a beta of the CNI in the coming months. After this new CNI is generally available, we'll make it available in EKS. We do not plan to deprecate the current CNI plugin within EKS until we achieve parity between both generations of CNI plugins.
Let us know what you think below. We'll update this issue as we progress.
The text was updated successfully, but these errors were encountered: