-
Notifications
You must be signed in to change notification settings - Fork 266
Conversation
4fb40f1
to
44a6b27
Compare
Going to need KVO to create the AppVersion on startup for this to work. |
44a6b27
to
7d77b53
Compare
Not quite ready for review, need some more coordination with KVO. KVO must create the app version from this new cluster configure object which now holds the initial version. |
c70bebd
to
737cdb6
Compare
75365c1
to
cb0ce13
Compare
Alright, so this is good enough for initial review. It won't work exactly because I need to build a new dev KVO image after https://github.com/coreos-inc/kube-version-operator/pull/316 merges, however it's ready for at least the Terraform changes to be reviewed. Just to explain this new bootstrap method: All control plane manifests have been moved out of this repo and into the KVO upgrade spec. When the master node starts, it will run KVO via container image in "render" mode, which will render all assets, including bootstrap assets, onto disk using the cluster-config for templating. From there, the rest of the process continues as normal. Bootkube will start and kick the rest off. |
@sym3tri @s-urbaniak @alexsomesan -- could you take a review pass at this (mostly for terraform stuff). It's getting pretty close - and to keep the timezone delays to a minimum, hoping to start getting some feedback. |
With this last WIP commit the AWS smoke tests are passing when run on my machine (the CI test seems to be failing immediately with no actual output as to why... most likely because I need to rebase and pull in other changes to make Jenkins happy). I can rebase first thing tomorrow, however I would really appreciate some feedback on this new approach. I've outlined the new bootstrap method in a comment above, however I can expand on that if need be. I know that I will need to port my Terraform changes to other platforms, as they are all AWS specific at the moment, but initial feedback on the AWS changes and overall direction would be greatly appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great moving forward on getting rid of manifests from the installer, thanks! General concerns from my side (which should not prevent us from moving forward with this):
-
This adds yet another service to the bootstrap pipeline, the
kvo.service
. I am aware this is not really avoidable right now, but we should rethink the bootstrap service architecture in the future in order not to introduce yet more complexity and (more importantly) possible failure domains. -
The next general concern is actually connected with the above: How are we going to proceed with further operators? KVO is one, next will most probably be the tectonic-network-operator which will deploy network assets. Ideally I'd envision only two render points in the installer:
- Manifests rendered "before" the control plane starts.
- Manifests rendered "after" the control plane starts.
modules/bootkube/variables.tf
Outdated
variable "pull_secret_path" { | ||
type = "string" | ||
description = "Path on disk to your Tectonic pull secret. Obtain this from your Tectonic Account: https://account.coreos.com." | ||
default = "/Users/coreos/Desktop/config.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont' think we should leave this default value here. Let's leave it unset to enforce setting a value when declaring the module.
Wants=kubelet.service | ||
After=kubelet.service | ||
Wants=kvo.service | ||
After=kvo.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the documentation in https://github.com/coreos/tectonic-installer/blob/master/Documentation/dev/node-bootstrap-flow.md as this introduces a new service, kvo.service
, and a new dependency between kvo.service
and bootkube.service
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
After=kubelet.service | ||
|
||
[Service] | ||
Type=oneshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not to model this as a oneshot
service, but rather as a simple
service type using Restart=on-failure
. You need to move the docker run
statement to ExecStartPre
to express restartability in case the docker download/run fails. Please see https://github.com/coreos/tectonic-installer/blob/0cf916e/modules/ignition/resources/services/k8s-node-bootstrap.service#L7-L33 as a reference point.
@@ -7,7 +7,7 @@ | |||
|
|||
"tectonic_etcd_servers": [""], | |||
|
|||
"tectonic_base_domain": "tectonic.dev.coreos.systems", | |||
"tectonic_base_domain": "dparker.k8s.work", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these are going away ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yes. Had to modify for local testing.
modules/bootkube/assets.tf
Outdated
: join(",", formatlist("https://%s:2379", var.etcd_endpoints)) | ||
}" | ||
|
||
master_count = "${var.master_count}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's group the single line vars and sort them alphabetically and only let the etcd_server
variable have a newline between them like so:
vars {
advertise_address = "${var.advertise_address}"
cloud_config_path = "${var.cloud_config_path}"
cloud_provider_profile = "${var.cloud_provider != "" ? "${var.cloud_provider}" : "metal"}"
cluster_cidr = "${var.cluster_cidr}"
...
etcd_servers = "${
var.experimental_enabled
? format("https://%s:2379", cidrhost(var.service_cidr, 15))
: var.etcd_ca_cert_pem == ""
? join(",", formatlist("http://%s:2379", var.etcd_endpoints))
: join(",", formatlist("https://%s:2379", var.etcd_endpoints))
}"
}
}
We found this is easier for diff'ing than with logical grouping.
Update KVO image and last failing smoke test.
@@ -20,7 +20,9 @@ When a cluster node is being bootstrapped from scratch, it goes through several | |||
|
|||
Additionally, only on one of the master nodes the following kubernetes bootstrapping happens: | |||
|
|||
1. `bootkube.service` is started after `kubelet.service` start | |||
1. `kvo.service` is started after `kubelet.service` start | |||
1. `kvo.service` renders all bootstrap and self-hosted control plane assets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should just squash the kvo.service step as part of bootkube.service? Really it's "bootstrap" service - and it should only ever run once on a single node anyway (I believe as this stands kvo.service would run once on every master)
Looks like the last CI run failed because of a Jenkins issue... can someone kick it over? @s-urbaniak |
Since I'll no longer be maintaining this PR, here's a few notes for whoever comes next:
I think that is it. I think @diegs is most likely the next person to pick this up, or at least the point of contact until somebody else does. |
Thanks @derekparker for all your hard work. For posterity this is on ice for the time being and will be picked up for the January release at the earliest. |
Resolved in #2702 |
This patch installs core Kubernetes components using the KVO.
This mode in the KVO is triggered by specifying a "DesiredVersion" without a "CurrentVersion".