Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

WIP: Use KVO to install cluster #1969

Closed
wants to merge 5 commits into from
Closed

Conversation

derekparker
Copy link
Contributor

This patch installs core Kubernetes components using the KVO.

This mode in the KVO is triggered by specifying a "DesiredVersion" without a "CurrentVersion".

@derekparker
Copy link
Contributor Author

derekparker commented Sep 21, 2017

Going to need KVO to create the AppVersion on startup for this to work.

@derekparker
Copy link
Contributor Author

Not quite ready for review, need some more coordination with KVO. KVO must create the app version from this new cluster configure object which now holds the initial version.

@derekparker
Copy link
Contributor Author

Alright, so this is good enough for initial review. It won't work exactly because I need to build a new dev KVO image after https://github.com/coreos-inc/kube-version-operator/pull/316 merges, however it's ready for at least the Terraform changes to be reviewed.

Just to explain this new bootstrap method:

All control plane manifests have been moved out of this repo and into the KVO upgrade spec. When the master node starts, it will run KVO via container image in "render" mode, which will render all assets, including bootstrap assets, onto disk using the cluster-config for templating. From there, the rest of the process continues as normal. Bootkube will start and kick the rest off.

@aaronlevy
Copy link
Contributor

@sym3tri @s-urbaniak @alexsomesan -- could you take a review pass at this (mostly for terraform stuff). It's getting pretty close - and to keep the timezone delays to a minimum, hoping to start getting some feedback.

@derekparker
Copy link
Contributor Author

With this last WIP commit the AWS smoke tests are passing when run on my machine (the CI test seems to be failing immediately with no actual output as to why... most likely because I need to rebase and pull in other changes to make Jenkins happy).

I can rebase first thing tomorrow, however I would really appreciate some feedback on this new approach. I've outlined the new bootstrap method in a comment above, however I can expand on that if need be. I know that I will need to port my Terraform changes to other platforms, as they are all AWS specific at the moment, but initial feedback on the AWS changes and overall direction would be greatly appreciated.

cc @sym3tri @s-urbaniak @alexsomesan

Copy link
Contributor

@s-urbaniak s-urbaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great moving forward on getting rid of manifests from the installer, thanks! General concerns from my side (which should not prevent us from moving forward with this):

  • This adds yet another service to the bootstrap pipeline, the kvo.service. I am aware this is not really avoidable right now, but we should rethink the bootstrap service architecture in the future in order not to introduce yet more complexity and (more importantly) possible failure domains.

  • The next general concern is actually connected with the above: How are we going to proceed with further operators? KVO is one, next will most probably be the tectonic-network-operator which will deploy network assets. Ideally I'd envision only two render points in the installer:

  1. Manifests rendered "before" the control plane starts.
  2. Manifests rendered "after" the control plane starts.

variable "pull_secret_path" {
type = "string"
description = "Path on disk to your Tectonic pull secret. Obtain this from your Tectonic Account: https://account.coreos.com."
default = "/Users/coreos/Desktop/config.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think we should leave this default value here. Let's leave it unset to enforce setting a value when declaring the module.

Wants=kubelet.service
After=kubelet.service
Wants=kvo.service
After=kvo.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the documentation in https://github.com/coreos/tectonic-installer/blob/master/Documentation/dev/node-bootstrap-flow.md as this introduces a new service, kvo.service, and a new dependency between kvo.service and bootkube.service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

After=kubelet.service

[Service]
Type=oneshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not to model this as a oneshot service, but rather as a simple service type using Restart=on-failure. You need to move the docker run statement to ExecStartPre to express restartability in case the docker download/run fails. Please see https://github.com/coreos/tectonic-installer/blob/0cf916e/modules/ignition/resources/services/k8s-node-bootstrap.service#L7-L33 as a reference point.

@@ -7,7 +7,7 @@

"tectonic_etcd_servers": [""],

"tectonic_base_domain": "tectonic.dev.coreos.systems",
"tectonic_base_domain": "dparker.k8s.work",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are going away ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yes. Had to modify for local testing.

: join(",", formatlist("https://%s:2379", var.etcd_endpoints))
}"

master_count = "${var.master_count}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's group the single line vars and sort them alphabetically and only let the etcd_server variable have a newline between them like so:

vars {
   advertise_address      = "${var.advertise_address}"
   cloud_config_path      = "${var.cloud_config_path}"
   cloud_provider_profile = "${var.cloud_provider != "" ? "${var.cloud_provider}" : "metal"}"
   cluster_cidr           = "${var.cluster_cidr}"
...
   etcd_servers = "${
      var.experimental_enabled
        ? format("https://%s:2379", cidrhost(var.service_cidr, 15))
        : var.etcd_ca_cert_pem == ""
          ? join(",", formatlist("http://%s:2379", var.etcd_endpoints))
          : join(",", formatlist("https://%s:2379", var.etcd_endpoints))
      }"
  }
}

We found this is easier for diff'ing than with logical grouping.

@@ -20,7 +20,9 @@ When a cluster node is being bootstrapped from scratch, it goes through several

Additionally, only on one of the master nodes the following kubernetes bootstrapping happens:

1. `bootkube.service` is started after `kubelet.service` start
1. `kvo.service` is started after `kubelet.service` start
1. `kvo.service` renders all bootstrap and self-hosted control plane assets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should just squash the kvo.service step as part of bootkube.service? Really it's "bootstrap" service - and it should only ever run once on a single node anyway (I believe as this stands kvo.service would run once on every master)

@derekparker
Copy link
Contributor Author

Looks like the last CI run failed because of a Jenkins issue... can someone kick it over? @s-urbaniak

@derekparker
Copy link
Contributor Author

Since I'll no longer be maintaining this PR, here's a few notes for whoever comes next:

  • The KVO render step has been moved to its own binary (https://github.com/coreos-inc/kube-version-operator/tree/master/cmd/render-manifests)
  • A new image & repo has been created for it: https://quay.io/repository/coreos/kube-version-renderer
  • The KVO service needs to be changed in order to use this new image, instead of using the actual operator
  • We no longer need to put docker creds in place, as the renderer image is not private
  • Most likely want to put the render & bootkube step into a single bootstrap service file
  • The only platform fully updated for this new flow is AWS, so other platforms will have to be finished as well
  • Must ensure that vanilla k8s continues to work (the discussion happened ~3 weeks ago so I forget everything that is involved in this)

I think that is it. I think @diegs is most likely the next person to pick this up, or at least the point of contact until somebody else does.

@diegs
Copy link
Contributor

diegs commented Nov 3, 2017

Thanks @derekparker for all your hard work. For posterity this is on ice for the time being and will be picked up for the January release at the earliest.

@diegs diegs self-assigned this Nov 3, 2017
@diegs
Copy link
Contributor

diegs commented Jan 11, 2018

Resolved in #2702

@diegs diegs closed this Jan 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants