Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Windows Support CAEP #3616

Merged

Conversation

jsturtevant
Copy link
Contributor

@jsturtevant jsturtevant commented Sep 9, 2020

What this PR does / why we need it:
Add proposal for Windows support in CAPI and providers. Related to #2218

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 9, 2020
@k8s-ci-robot k8s-ci-robot requested review from detiber and ncdc September 9, 2020 20:41
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 9, 2020
@jsturtevant
Copy link
Contributor Author

/cc @marosset @michmike @ksubrmnn

@benmoss
Copy link

benmoss commented Sep 9, 2020

I would remove the

Fixes #2218

since this is just the proposal

@vincepri vincepri changed the title Windows Support CAEP 📖 Windows Support CAEP Sep 15, 2020
@vincepri
Copy link
Member

/assign @CecileRobertMichon @benmoss @devigned

docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach looks good to me, most comments are wording suggestions/asks for clarification

docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
@jsturtevant jsturtevant force-pushed the capi-windows-proposal branch 2 times, most recently from bae43ca to 7147dcb Compare September 23, 2020 19:35
@randomvariable
Copy link
Member

Didn't comment much on the PR, since did most of it in the google docs, so broadly supportive of this CAEP.

@jsturtevant
Copy link
Contributor Author

I took a look at pull-cluster-api-verify-external-links — Job failed. . I didn't see any failures related to this document.

/test pull-cluster-api-verify-external-links

@randomvariable
Copy link
Member

I took a look at pull-cluster-api-verify-external-links — Job failed. . I didn't see any failures related to this document.

Ignore that error, we're being rate limited by Github for lack of a token.

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. Just left some grammatical suggestions.

docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
docs/proposals/20200804-windows-support.md Outdated Show resolved Hide resolved
@jsturtevant jsturtevant force-pushed the capi-windows-proposal branch 3 times, most recently from d62ba76 to 0c741f2 Compare October 13, 2020 22:13
@randomvariable
Copy link
Member

@jsturtevant @michmike

For domain joining a Windows node I thought using pre/post kubeadm commands could be an option. @randomvariable at sig-windows meeting today you mentioned you have some ideas?

I think for the initial implementation, this is fine. I don't think we need to say much more than this right now.

There's going to be a security improvement that we would want to do, and that's where I think maybe having a separate controller that provides one-time domain join accounts and pre-created AD computer resources could be useful and what I'm thinking about implementing. I would however like to treat this as a subset of #3761 rather than blocking on this CAEP. This CAEP does not prohibit doing domain joins via the use of pre/post kubeadm commands, which is the main point.

@jsturtevant jsturtevant force-pushed the capi-windows-proposal branch from 01e696c to 4050406 Compare October 14, 2020 18:00
@jsturtevant
Copy link
Contributor Author

@CecileRobertMichon @randomvariable @ncdc This is ready for final review. Based on the meeting today I've removed the retry logic since it should be addressed by recent updates to kubeadm in k8s 1.19 (and backported to 1.17)

- [Infrastructure provider implementation](#infrastructure-provider-implementation)
- [User Stories](#user-stories)
- [As an operator, I would like to create Windows OS worker nodes with the CAPI API.](#as-an-operator-i-would-like-to-create-windows-os-worker-nodes-with-the-capi-api)
- [As an operator, I would like to manage Windows OS worker nodes with the CAPI API.](#as-an-operator-i-would-like-to-manage-windows-os-worker-nodes-with-the-capi-api)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like some type active directory awareness would be worth diving into.

- [As an operator, i'd like to federate my windows OS worker nodes with active directory]

or is that out of scope? Just a thought from what we think our customers would like to be able to do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feel free to merge if its ready , dont wanna hold things up, maybe can iterate if this is in scope)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely. I think we decided to move forward with this and address the AD via #3761

ref: #3616 (comment)

@benmoss
Copy link

benmoss commented Oct 19, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2020
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, devigned

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2020
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 19, 2020

@jsturtevant: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-verify-external-links 7147dcb link /test pull-cluster-api-verify-external-links

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@benmoss
Copy link

benmoss commented Oct 19, 2020

/test pull-cluster-api-test-main

@k8s-ci-robot k8s-ci-robot merged commit c164748 into kubernetes-sigs:master Oct 19, 2020
@CecileRobertMichon
Copy link
Contributor

I didn't realize my lgtm would auto-apply the approve label, didn't mean for this to merge right away :/

@vincepri
Copy link
Member

/milestone v0.4.0

No worries @CecileRobertMichon, have to investigate why that happened, might be a new behavior that we should take into account. If folks have more outstanding questions on this PR, feel free to comment or amend in a different PR.

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Oct 19, 2020
@benmoss
Copy link

benmoss commented Oct 19, 2020

It looks like you might have used the "Approve" feature of the pull request review? I also didn't know that was the same as an /approve

Yeah, looks like that's it:
https://github.com/kubernetes/test-infra/blob/master/config/prow/plugins.yaml#L96
https://github.com/kubernetes/test-infra/blob/b08b9c59e07ba5c83677186300fb3afdea984812/prow/plugins/approve/approve.go#L560-L571

@CecileRobertMichon
Copy link
Contributor

That looks like new behavior, didn't work that way in the past, eg. #3730 (review)

@CecileRobertMichon
Copy link
Contributor

we're seeing the same thing in capz: kubernetes-sigs/cluster-api-provider-azure#1002 (review) (also new behavior) cc @devigned

@CecileRobertMichon
Copy link
Contributor

The default changed 4 days ago because kubernetes/test-infra#19579 removed the deprecated option review_acts_as_approve (defaulted to false) in favor of ignore_review_state which defaults to false (but opposite behavior)

Copy link

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for delayed review @jsturtevant. I asked for some clarifications let me know if you want to discuss them on slack.

- [netbios names](#netbios-names)
- [Infrastructure provider implementation](#infrastructure-provider-implementation)
- [User Stories](#user-stories)
- [As an operator, I would like to create Windows OS worker nodes with the CAPI API.](#as-an-operator-i-would-like-to-create-windows-os-worker-nodes-with-the-capi-api)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Another example is the the [sig-windows-tools](https://github.com/kubernetes-sigs/sig-windows-tools) which provide scripts for image configuration when using Kubeadm.

Although the Linux implementation in image-builder uses Ansible for configuration, Windows isn't going to share
the same configuration because [Ansible](https://docs.ansible.com/ansible/latest/user_guide/windows.html) requires [Windows specific modules](https://docs.ansible.com/ansible/latest/modules/list_of_windows_modules.html) to do the configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate? Which windows specific modules are needed? We used ansible to configure Windows nodes in the past and we had to install modules on the host from which ansible is being run, a linux host in our case but nothing on the Windows except for enabling WinRM.

By leveraging cloudbase-init, an infrastructure provider implementation will require only a few changes which include:

- Make changes to their provider api to enable Windows OS infra machines ([example](https://github.com/ionutbalutoiu/cluster-api-provider-azure/commit/9c8daedac75959b141fec7ea909c2c1fd0bd484b))
- Ensuring cloudbase-init is configured properly to read UserData which will contain the cloud-init script. Users must configure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoided infrastructure changes as on AWS as userData is automatically executed even on Windows VMs whereas in case of azure, we have a carry patch

But you're right in the sense we need to have consistency across cloud providers however we're a bit worried about the support of cloudbase-init across all the providers. What is the support statement like? Can we get some changes in if we need to immediately or in the long-term should we fork and attribute?


Wins.exe is currently the [recommended way to use kubeadm](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/).
Limiting access to the named pipes that are required is one way to mitigate access. Wins is currently
required during and after provisioning for running kube-proxy and the CNI daemonset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated question: How are CNI and kube-proxy DS's container images being built for Windows and who owns building and supporting them?


## Upgrade Strategy

Nodes that use this pattern will require the infrastructure to be immutable as specified by CAPI documentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Windows does not support Docker in Docker so end to end test can not be added for Windows specific behavior.
If changes are required during development unit tests will be required.

For infrastructure providers the testing plan is left up to each infrastructure provider. It is recommended to leverage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to jobs that need to be run for every infrastructure provider?

In the future, when support for [Privileged Containers for Windows containers](https://github.com/kubernetes/enhancements/issues/1981) is merged, we might be able to revisit this proposal
and use privileged containers in place of wins.exe enabled containers.

Each infrastructure providers must provide their own `PreKubeadmCommands`/`PostKubeadmCommands` scripts that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what each infrastructure provider should do? What would be a sample PreKubeadmCommand for azure? Downloading Windows base image like server-core or something like that?

@CecileRobertMichon CecileRobertMichon mentioned this pull request Apr 5, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.