Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

Add setup for deploying to DigitalOcean #604

Closed
wants to merge 2 commits into from

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented May 9, 2017

This is a port of Tectonic Installer to DigitalOcean.

@coreosbot
Copy link

Can one of the admins verify this patch?

1 similar comment
@coreosbot
Copy link

Can one of the admins verify this patch?

--initial-advertise-peer-urls=http://${var.cluster_name}-etcd-${count.index}.${var.base_domain}:2380 \
--listen-client-urls=http://0.0.0.0:2379 \
--listen-peer-urls=http://0.0.0.0:2380
EOF
Copy link

@glevand glevand May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you should just use the etcd environment variables here and not redefine ExecStart, so something like:

Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"

See: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this configuration from modules/aws/etcd/ignition.tf. I guess the same suggestion holds for that implementation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ExecStart command in the CoreOS provided etcd-member.service ever changes you won't get those updates if you redefine ExecStart in the drop-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glevand I mean, should we make the same change in the AWS module, since my configuration was copied from there?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crawford What do you think?

Environment="ETCD_IMAGE=${var.container_image}"
ExecStart=
ExecStart=/usr/lib/coreos/etcd-wrapper \
--name=etcd \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you'll need to set --name=etcd to hostname. Otherwise all members of the cluster will attempt to register with their name etcd

@s-urbaniak
Copy link
Contributor

@aknuds1 sorry, that this fell under the radar. I actually thing this is a really great effort and you made quite an impressive progress on that. Thanks a lot for the contribution! I picked up this PR locally and got pretty far with it, finalizing the last bits. Are you interested in finalizing this PR?

}

data "template_file" "etcd-cluster" {
template = "${file("${path.module}/resources/etcd-cluster.tpl")}"
Copy link
Contributor

@s-urbaniak s-urbaniak Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1 👍 That is a great idea, using a generic template for this. I actually went a step further and did the same for 40-etcd-cluster.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to do more "generification" of the whole project in order to minimize code duplication between the different platforms. An idea I have in mind is to generate Terraform files from templates, through e.g. a Python script (I do this with all my Kubernetes manifests). Making generic configurations with Terraform gets pretty painful at some point I find.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, doing this in TF is not adorable, but unfortunately we can't sprinkle in another language in here, because it is used in GUI based installer. I have been thinking of introducing custom Terraform data sources for other reasons, and it could solve some of the generification problems. At least we could then use Go code then to set up more complex content data structures.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jul 14, 2017 via email

@s-urbaniak
Copy link
Contributor

@aknuds1 great! let me finalize my local branch to a working state. I squashed everything into one commit to be able to move forward. You could cherry-pick it and re-author it under your name, such that the credit won't get lost. Hang on, while I am finalizing this.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jul 16, 2017

@s-urbaniak Thanks! Let me know when it is ready please :)

@s-urbaniak
Copy link
Contributor

@aknuds1 I have a working cluster using the following (squashed) commit based on your PR: s-urbaniak@d79cd1c

Admittedly it leaves out a few details you had, i.e. enabling swap, the sshguard, but this is to align it more with the current setup we have at the other cloud providers.

Regarding DNS my commit also assumes you configured a domain already yourself, that is also what we assume in the other platforms.

There's a few tweaks I would like to make, namely exposing masters/workers behind LBs and factor the DNS setup out in their own package. Once done, we can finalize this PR.

@s-urbaniak
Copy link
Contributor

having said that, a LB costs quite some money, so if we are targeting DO for simple try-out setups, we can certainly also live with the current solution ;-)

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jul 19, 2017

@s-urbaniak Could you base your branch on mine? I can't just cherry-pick it since there are tons of conflicts between the two.

@coreosbot
Copy link

Can one of the admins verify this patch?

@aknuds1 aknuds1 force-pushed the digitalocean branch 2 times, most recently from 55ac711 to c46dc20 Compare August 13, 2017 17:09
@coreosbot
Copy link

Can one of the admins verify this patch?

@aknuds1
Copy link
Contributor Author

aknuds1 commented Aug 22, 2017

@s-urbaniak @alekssaul Could you help me in figuring out how to write acceptance tests for this platform, according to the current model, so that I can finish it and submit for review? I'm just not sure how it's supposed to be done currently.

Eventually, you could point out how to test by hand so I can root out remaining issues that way :)

@coreosbot
Copy link

Can one of the admins verify this patch?

1 similar comment
@coreosbot
Copy link

Can one of the admins verify this patch?

@cpanato
Copy link
Contributor

cpanato commented Sep 8, 2017

hello @aknuds1, thanks for the PR, can you please rebase with our current master, then I can run our smoke tests for other platforms just to validate if this change do not break others.

thanks

@cpanato
Copy link
Contributor

cpanato commented Sep 8, 2017

Thanks, @aknuds1, we will prioritize this PR for our next sprint :)

set -o pipefail

register_floating_ip() {
curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${1}" -d \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the curl request will almost always return 0, unless you use --fail, you can check the exit code with echo $? so the until loop won't work as expected. Also after deploying a cluster the load balancer show no droplets so I can't access to the console

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enxebre, I have missed the --fail option.

@@ -0,0 +1,7 @@
resource "digitalocean_record" "etcd_nodes" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reasoning behind it is to have ability to easily swap dns approach

@enxebre
Copy link
Member

enxebre commented Jan 4, 2018

Hey @aknuds1 thanks! I dropped a few comments but I'm in favour of getting this in as it is and functional improvement can be done in follow up PR, just please make sure everything worthy is included in the readme, e.g as per my last deployment, load balancer has no droplets.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 5, 2018

@enxebre Hey, sorry I couldn't respond until now, I've been busy moving stuff between apartments :/ I also see that load balancer and floating IP creation are now broken. Sorry, I'll try to fix it. If you would happen to spot the reason, please let me know :)

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 5, 2018

Maybe I see now how my register-master.sh script is broken, in the middle of testing my fixes...

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 5, 2018

@enxebre I think I fixed the issue with registering the master droplet with the load balancer and the floating IP, it was just a stupid bash programming mistake. Can you please test again? I've just created a cluster, and I can't tell yet if it's working or not except that the LB and the floating IP both have the droplet assigned to them.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 5, 2018

@enxebre Good news! After my shell script fixes, I could successfully bootstrap a cluster!

This extends Tectonic Installer to also support the DigitalOcean cloud
provider.
@enxebre
Copy link
Member

enxebre commented Jan 10, 2018

lgtm cc @squat

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 10, 2018

Thanks @enxebre!!!

@nreisbeck
Copy link

nreisbeck commented Jan 22, 2018

@enxebre

I managed to deploy a cluster but the console is not available.

This was fixed in a recent PR

Also as mentioned in you comment above when running terraform destroy I always get: "digitalocean_floating_ip.master: Error Unassigning FloatingIP (159.203.159.11) from the droplet: POST https://api.digitalocean.com/v2/floating_ips/159.203.159.11/actions: 500 Server was unable to give you a response."

This is a known issue, I'm working on a fix for this now, but will require some rework of the bootstrapping process due to dependency cycles.

nreisbeck added a commit to nreisbeck/tectonic-installer that referenced this pull request Jan 22, 2018
Consolidate DNS records in master/dns.tf to align with db1c71e, in
advance of moving to modules/dns at
[request](coreos#604 (comment))
of @enxebre
nreisbeck added a commit to nreisbeck/tectonic-installer that referenced this pull request Jan 22, 2018
Consolidate DNS records in master/dns.tf to align with db1c71e, in
advance of moving to modules/dns at
[request](coreos#604 (comment))
of @enxebre
nreisbeck added a commit to nreisbeck/tectonic-installer that referenced this pull request Jan 23, 2018
Consolidate DNS records in master/dns.tf to align with db1c71e, in
advance of moving to modules/dns at
[request](coreos#604 (comment))
of @enxebre
nreisbeck added a commit to nreisbeck/tectonic-installer that referenced this pull request Jan 23, 2018
platforms/digitalocean: update descriptions for variables

modules/digitalocean/(master|worker): remove duplicate dns record.

modules/digitalocean: remove duplicate digitalocean_droplet.etcd_node

94df14e added etcd.tf to align with master/worker definitions.

modules/digitalocean/master: add droplet_id to floating_ip resource

Fixes #1, #2.

Remove `local-exec` provisioner from master_node. No need to do this in
a posthook, Terraform will wait for the resource to be provisioned
before assigning the Droplet to the Floating IP.

There are no dependencies on this to be pre-computed and used in a
template file that would block them from using the Terraform lifecycle &
dependency built-ins.

Disable `depends_on` for `tectonic_assets` and `kubeconfig` null
resources temporarily.

modules/digitalocean/master: move api dns record to master/dns.tf

Consolidate DNS records in master/dns.tf to align with db1c71e, in
advance of moving to modules/dns at
[request](coreos#604 (comment))
of @enxebre

modules/digitalocean/(etcd|master|worker): clean up variables.tf

platforms/digitalocean: clean up `droplet_image`, use $var.cl_channel

platforms/digitalocean: update default droplet sizes to use newer slugs

2018-01-16: [New price points for higher capacity
hardware](https://blog.digitalocean.com/new-droplet-plans/)

modules/digitalocean/(etcd|master|worker): use specific role count vars.

fix 19b8b8d
@tsal
Copy link

tsal commented Jan 24, 2018

This thread is a nail-biter. Any updates on when this might make it to mainline?

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 24, 2018

@tsal According to CoreOS, it's undergoing testing at the moment with the goal of integrating the PR. I'm also waiting for them to let me know if I should keep this branch in sync with coreos/master or not (and also integrate PRs from others).

@sym3tri
Copy link
Contributor

sym3tri commented Jan 26, 2018

@aknuds1 Sorry for the delay. I think this is good to merge. Can you just do one last rebase to fix the merge conflicts?

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 29, 2018

@sym3tri Sorry for the delayed reply, I've got the flu :( I've rebased locally, but there are a couple of changes since my last rebase that I can't reconcile. A couple of files that I've modified in order to control the secure port of the API server have disappeared, modules/bootkube/resources/bootstrap-manifests/bootstrap-apiserver.yaml and modules/bootkube/resources/manifests/kube-apiserver.yaml, where do I configure the API server secure port now?

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 30, 2018

@enxebre Do you know where to find these two API server manifests that have disappeared since my last rebase: modules/bootkube/resources/bootstrap-manifests/bootstrap-apiserver.yaml and modules/bootkube/resources/manifests/kube-apiserver.yaml? At some point I needed to modify these to configure the secure port for the API server to use.

@sym3tri
Copy link
Contributor

sym3tri commented Jan 30, 2018

@aknuds1 yeah we're changing up the way things work in master. Can you close this PR and re-open it against the track-1 branch instead of master?

@klausenbusk
Copy link

@enxebre Do you know where to find these two API server manifests that have disappeared since my last rebase: modules/bootkube/resources/bootstrap-manifests/bootstrap-apiserver.yaml and modules/bootkube/resources/manifests/kube-apiserver.yaml? At some point I needed to modify these to configure the secure port for the API server to use.

They was removed in: #2702

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 30, 2018

@sym3tri Thanks for the heads up, will try.

@aknuds1 aknuds1 closed this Jan 30, 2018
@sym3tri
Copy link
Contributor

sym3tri commented Jan 30, 2018

For those following along, this PR was moved to #2862

size = "${var.droplet_size}"
ssh_keys = ["${var.ssh_keys}"]
tags = ["${var.extra_tags}"]
user_data = "${data.ignition_config.main.rendered}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add private_networking = true (and for node.tf too) to create private network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ditansu It's a good point, I wasn't aware of DigitalOcean's support for this until recently. Maybe it should be an option whether to create private nodes or not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be an option whether to create private nodes or not?

Why not, but I think a privet net need be always, because cluster it is always two or more nodes and it's much "cheaper" and secure use privet net. Especially if we take in account privet network is don't exclude public IP e.g. node always will be have public IP in any case - enable or disable privet IP.

Copy link
Contributor Author

@aknuds1 aknuds1 Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ditansu Pardon my ignorance, but what's the benefit of creating nodes with both private and public IPs, will it not be as insecure so long as nodes have public IPs? I do see though that Typhoon creates nodes with both private and public IPs, and was wondering what the reasoning is behind this. Is the benefit that cluster nodes will communicate with each other over the private net?

I was wondering if it might be an idea to introduce an option to create nodes with private IPs only, for security purposes? Kops has an option on AWS to create nodes with private IPs and a single bastion host to allow SSH-ing into the cluster. In the case of DigitalOcean I'm not sure if there's a need for a bastion host as it doesn't have VPCs, and I would think one might as well just use any other node in one's account as the bastion. Thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1, I guess the benefit of public IP is only for deploy stage without bastion. So you could rollout a cluster from your laptop directly. After deploy you could wrap the cluster in firewall. By the way https://github.com/kubicorn/kubicorn/ can up cluster with DO's firewall out-of-box. And sure you could use any DO's server as bastion in deploy time and usage time. But I don't see benefits from bastion it self, because there is a very small opportunity of hacker attack during cluster rollout time, and we have firewall (for tagging droplet group) to protect whole our cluster.

Copy link

@ditansu ditansu Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1 I know about Typhoon but haven't tried it yet. But judging this https://github.com/poseidon/terraform-digitalocean-kubernetes/blob/master/network.tf they support firewall from out-of-box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ditansu That's awesome; I have used Typhoon, but didn't notice the firewalling :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1 you are welcome ))) buy the way, what do you think regarding tectonic's future under RedHat ?

Copy link
Contributor Author

@aknuds1 aknuds1 Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ditansu I am also unsure! But in my opinion it's good to have Typhoon as a second project in this space :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1 absolutely agree)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.