Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically include ServiceCIDR as an argument to Windows nodes bootstrap #4088

Open
tzifudzi opened this issue Jun 20, 2023 · 11 comments
Open
Assignees
Labels
feature New feature or request

Comments

@tzifudzi
Copy link
Contributor

tzifudzi commented Jun 20, 2023

Tell us about your request

After Windows support is implemented, it will be ideal and preferable to automatically include ServiceCIDR as an argument to bootstrap file for Windows nodes. It is possible we can retrieve this value along with the DNSClusterIP as is part of the DescribeCluster request operation against the EKS API. The value will be set in serviceIpv4Cidr in the kubernetesNetworkConfig.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

If the ServiceCIDR is not automatically retrieved or provided, the default value will be used as described in EKS docs Bootstrap script configuration parameters and will default to 172.20.0.0/16 or 10.100.0.0/16 based on the IP address of the primary interface.

In a case where a custom service CIDR was used, pods behind a Kubernetes service not have network connectivity as it would have been misconfigured with the default values. If this issue is not fixed, it would become a blocker for anyone wishing to adopt Karpenter as an autoscaler for Windows nodes in a case where they need a custom service CIDR.

Are you currently working around this issue?

See workaround listed here https://karpenter.sh/v0.30/concepts/node-templates/#windows

Additional Context

At the current time of opening the issue, support for Windows is still yet to be released but has been merged in PR #3698.

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tzifudzi tzifudzi added the feature New feature or request label Jun 20, 2023
@tzifudzi
Copy link
Contributor Author

I am interested in working on this issue and will take it up.

@ranaciit
Copy link

This is kind of a show-stopper if the VPC CIDR range is not 172.20.0.0/16 or 10.100.0.0/16

@jzhn
Copy link
Contributor

jzhn commented Sep 1, 2023

We had the same issue, with Karpenter v0.30.0 which added windows userdata support this issue can be worked around.

Just add this one line of user data to AWSNodeTemplate, replacing <EKSClusterName> to your EKS cluster name.

  userData: |
    $global:EKSCluster = Get-EKSCluster -Name <EKSClusterName>

The user data will be prepended before the EKS bootstrap script which could make use of the $global:EKSCluster.

Of course a proper Karpenter level support for ServiceCIDR is still very nice to have.

@jonathan-innis
Copy link
Contributor

@jzhn This is awesome! Would you be open to adding this into the upstream documentation for users to be aware of until we get the proper ServiceCIDR feature merged into the project?

@jzhn
Copy link
Contributor

jzhn commented Sep 4, 2023

@jonathan-innis : I'm happy to PR for this workaround, but I hesitate to do that because the usage of $global:EKSCluster is internal to EKS Windows AMI's bootstrap script. I doubt it's opensourced (please point me to the repo if I'm wrong), so it's hard for me to verify the compatibility of this workaround.

If you could provide some insights that this workaround is safe and won't cause confusion I'll make a PR ASAP. Thanks.

@tzifudzi
Copy link
Contributor Author

tzifudzi commented Sep 6, 2023

Going to aim to prioritize this and hopefully have a PR up soon given how this is a show-stopper for some cases.

I am afraid that @jzhn is right that the Windows bootstrap is not open sourced but the contents are publicly readable on the node if you run an instance with the EKS Windows Optimized AMI.

@jzhn is also right regarding the fact that the global variables are internal to the Powershell bootstrap so they will only be accessible within the script itself. If you set the global variable outside the script it is overwritten internally in the script so this workaround will not work. Support for ServiceCIDR will have to be at Karpenter level for ease of use.

@jzhn
Copy link
Contributor

jzhn commented Sep 6, 2023

Thanks @tzifudzi for your information.

I want to add that the workaround of setting $global:EKSCluster in Karpenter userData should work, at least in latest 1.27 AMI Windows_Server-2019-English-Core-EKS_Optimized-1.27-2023.08.17 being tested by my org.

The Update-KubeConfig only calls Describe-EKSCluster and sets the $global:EKSCluster when $APIServerEndpoint or $Base64ClusterCA are missing. Since Karpenter always sets these two parameters, the $global:EKSCluster set at userData won't be overwritten.

function Describe-EKSCluster {
<#
.SYNOPSIS
Calls Get-EKSCluster to get cluster information like ServiceIpv4Cidr, APIServerEndpoint, Base64ClusterCA, etc.
#>
  # Get-EKSCluster call.
  if (-not [string]::IsNullOrEmpty($Endpoint)) {
    $global:EKSCluster = Get-EKSCluster -Name $EKSClusterName -EndpointUrl $Endpoint
  } else {
    $global:EKSCluster = Get-EKSCluster -Name $EKSClusterName
  }
}

function Update-KubeConfig {
<#
.SYNOPSIS
Creates/Updates kubeconfig file
#>
  # Update only if the APIServerEndPoint and Base64ClusterCA are empty.
  if ($APIServerEndpoint -and $Base64ClusterCA) {
    Write-Log "APIServer Endpoint and Cluster CA are being passed, bypassing cluster defaults."
  } else {
    Write-Log "Getting cluster information..."
    Describe-EKSCluster
    Write-Log "Using cluster information to get APIServer Endpoint and Cluster CA."
    # ...
  }
  # ...
}

Some personal thoughts:

  • Support for ServiceCIDR at Karpenter level is still a must given that the workaround breaks into internal of EKS bootstrap script.
  • It would be great for EKS to update Windows bootstrap script too. To me the script assumes that $global:EKSCluster is only used for extracting APIServerEndPoint and Base64ClusterCA, but it's also used for extracting ServiceCIDR. If the script can be updated to call Describe-EKSCluster when $ServiceCIDR is empty then it could benefit not only Karpenter but other EKS use cases too.

@tzifudzi
Copy link
Contributor Author

tzifudzi commented Sep 6, 2023

@jzhn Ah I see. When I referred to the value being overwritten I was referring to $global:ServiceCIDR which is overwritten when the bootstrap starts. I have looked at the code and what you have mentioned should work!

Please run a test locally and see if setting the cluster name results in the the ServiceCIDR value being correctly retrieved as we expect. I expect that it should work and if so, we can specify that as a workaround like @jonathan-innis suggested in the docs. I would give my thumbs up to for that workaround, and will also test a bit later myself to make sure it doesn't break anything else in the bootstrap flow.

To verify that the ServiceCIDR was set correctly, please check the file $env:ProgramData\Amazon\EKS\cni\config\vpc-bridge.conf on the node.

@jzhn
Copy link
Contributor

jzhn commented Sep 6, 2023

Yes, I've been running this workaround for my cluster for ~1 week with Karpenter v0.30.0 + Windows_Server-2019-English-Core-EKS_Optimized-1.27-2023.08.17. It's been working consistently that the right serviceCIDR is set on C:\ProgramData\Amazon\EKS\cni\config\vpc-bridge.conf and it resolved our weird windows pod networking issue.

Just for the ease of reproduce and test on your side, we used 10.x.0.0/16 for VPC range and 10.100.0.0/16 for serviceCidr range (without overlaps). When cluster VPC is in 10.* range, EKS bootstrap script defaults to 172.20.0.0/16 for serviceCidr range which caused networking pain.

Let me prepare a quick doc update PR then.

@tzifudzi
Copy link
Contributor Author

tzifudzi commented Oct 4, 2023

This change will be not be implemented within Karpenter, as discussed in closed PR #4625. The fix will be applied in the Windows bootstrap to extend it to accept an environment variable that can be set with custom userdata. Will share more information regarding the environment variable name and examples of how to use it at a later time within the next few weeks.

@tzifudzi
Copy link
Contributor Author

tzifudzi commented Oct 25, 2023

This issue can now be closed as the EKS Windows powershell bootstrap script now accepts an environment variable for Service CIDR which can be set with custom user data. This is available in the latest EKS Windows Optimizd AMIs, that is, version 1.28-2023.10.19 and later.

Example of setting the service CIDR:

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
...
spec:
  userData: |
    [System.Environment]::SetEnvironmentVariable('SERVICE_IPV4_CIDR', '10.100.0.0/16', 'Machine')
...

This has already been updated in AWS EKS Windows Documentation, and I am happy to add documentation to Karpenter too if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants