Add base terraform config for batch #2

jashapiro · 2020-08-03T13:09:26Z

This draft PR adds terraform files to build a the full stack of resources required for a nextflow queue.

This should include the necessary groups, roles, policies, compute environments and batch queues.

Of note, the policies include read/write access to only two buckets: nextflow-ccdl-data and nextflow-ccdl-results (read access is to all buckets).

The compute environments use an AMI based on Amazon ECS-Optimized Amazon Linux AMI, with changes to allow for greater data storage and installation of aws-cli as required for Nextflow.

Most of the settings I used were derived from instructions found here: https://t-neumann.github.io/pipelines/AWS-pipeline/, as well as by looking at the setup created by the cloudformation stack (https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/) that we had previously done some testing with.

I have not yet attempted to deploy this stack, as I wanted to get some feedback/checks before I did so, especially as it is the first time I have used terraform. 😰

While I think most of the stack is probably mostly correct, I am really flying blind on the vpc settings in nextflow-security.tf. I would love to have those looked at especially closely!

I also have not tagged anything, so if there are particular tagging conventions that I should follow, that would be great to hear about. (If there is a way to apply a set of tags to all resources at once, that would probably make the most sense...)

kurtwheeler · 2020-08-03T17:41:54Z

aws/nextflow-compute.tf

+    allocation_strategy = "BEST_FIT"
+    max_vcpus = 20
+    min_vcpus = 0
+    image_id = "ami-0a8857ac38c35157f"


I know you noted where this AMI came from and what it is in the PR description, but I think it'd be good to also note that in a comment here.

Oh I just saw aws/setup-log.md, maybe just mention that doc in a comment here?

aws/nextflow-compute.tf

kurtwheeler · 2020-08-03T17:51:34Z

aws/setup-log.md

+
+Briefly:
+1. We used the base AMI _Amazon ECS-Optimized Amazon Linux AMI_. (note, not Amazon Linux 2).
+This was launched with an 80GB EBS volume for data (compared to teh default 22) in addition to the base 8GB boot volume.


Suggested change

This was launched with an 80GB EBS volume for data (compared to teh default 22) in addition to the base 8GB boot volume.

This was launched with an 80GB EBS volume for data (compared to the default 22) in addition to the base 8GB boot volume.

kurtwheeler · 2020-08-03T17:55:28Z

aws/nextflow-compute.tf

+    spot_iam_fleet_role = aws_iam_role.nf_ecs_role.arn
+    bid_percentage = 20
+    max_vcpus = 100
+    min_vcpus = 0


Ok so if I understand this correctly, you've got a default queue that uses spot instances and a priority queue that uses on demand instances. They're both going to scale up automatically, default to a max_vcpus of 100 and the priority to 20. Is that all correct?

That is what is supposed to happen, yes. I may well change these in the future, but this seemed a good place to start.

Ok sweet. A summary like that might also be useful to have in a comment or doc somewhere.

Yeah I was planning to add a readme before making it into a real PR.

aws/nextflow-compute.tf

kurtwheeler · 2020-08-03T18:08:16Z

aws/nextflow-policies.tf

+    "s3:GetBucketCORS",
+    "s3:GetBucketLocation",
+    "s3:ReplicateDelete",
+    "s3:GetObjectVersion"


Is this everything? I think s3:* might be equivalent.

I don't know? Probably is nearly equivalent, but this was one that I copied from the CloudFormation setting, so I didn't want to mess around too much. I doubt changing it to s3:* would be a problem.

Word if you got this from there then it's probably best to leave it as is.

kurtwheeler · 2020-08-03T18:11:52Z

aws/nextflow-security.tf

+
+resource "aws_subnet" "nf_subnet" {
+  vpc_id = aws_vpc.nf_vpc.id
+  cidr_block = "10.1.1.0/24"


Are you planning on adding additional subnets? If not, you may just want to have this match the VPC's CIDR block.

I don't expect so? This is where I was completely lost, and really just was going off examples. I don't know what the function of multiple subnets would be, but I needed to make at least one, it seemed. the CloudFormation-created vpc definitely made more than one, but I don't know why, or how that might be used.

Ok then this is should be fine. It shouldn't really matter much anyway.

kurtwheeler

Ok this all looks pretty good to me. You'll definitely want to test it, it's hard to eyeball all the things that could be wrong but there's nothing major as far as I can tell.

jashapiro · 2020-08-03T20:10:00Z

Ok this all looks pretty good to me. You'll definitely want to test it, it's hard to eyeball all the things that could be wrong but there's nothing major as far as I can tell.

That is good to hear.

What is your recommended test procedure? I have been making sure syntax, etc works with terraform plan but I wasn't really sure the best next steps. I assume terraform apply, but if things go wrong, what is the rollback? Will terraform destroy only affect the resources defined in these files? If I make changes, do I use apply again to update things as needed, or is there another step?

and fix typo

kurtwheeler · 2020-08-03T20:38:10Z

Will terraform destroy only affect the resources defined in these files?

Yep, it's pretty good about that.

If I make changes, do I use apply again to update things as needed, or is there another step?

There is no other step other than maybe terraform plan. terraform apply will figure out what it needs to change and only modify/recreate those resources. terraform won't really do anything dangerous. The worst it can really do is get a corrupted state which will make it really hard to cleanup, but that's not that scary and you probably won't run into it.

kurtwheeler · 2020-08-04T20:07:49Z

Ok I added some untested terraform code. It shouldn't work just yet, you'll need to add the public key of your ssh key into the keypair resource definition. Also https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/batch_compute_environment#ec2_key_pair didn't specify what attribute of the key_pair it wanted, so maybe you'll need to switch that to id (normal instances use the name though).

Once this is set up, you'll be able to ssh in with ssh -i <private key file> <user name>@<instance ip>. If you get the user name wrong, it won't be obvious that's the issue. It'll depend on your AMI. I've had to use ec2_user, ubuntu, and root in the past.

If any of that gives you any trouble lemme know!

jashapiro · 2020-08-17T18:39:01Z

After Kurt successfully got things working by adding in missing gateway settings (who knew? Kurt), I have gone back and tightened things back up. I left in the sections that would allow SSH access to nodes, but that access is currently disabled: there is no public key passed to the nodes, and port 22 is closed.

I have also added more comments in the aws/setup-log.md (which maybe should be renamed README.md?) to describe the setup in a bit more detail.

I have also added an example nextflow.config file that includes minimal settings for running a job on the default batch queue.

jaclyn-taroni

👍 I think these docs are enough to get started, along with the custom AMI section of the Nextflow docs, but we should keep an eye on how to refine them over time!

jaclyn-taroni · 2020-08-17T19:22:09Z

aws/setup-log.md

+The spot price threshold is set high (currently 100%!) so it should never not run.
+
+To use these files, you will need to install Terraform on your local machine, either from https://www.terraform.io/downloads.html or via Conda, Homebrew, or the like.
+(You will also probably want the `aws` cli tools, and to have run `aws configure` to set your access key ID and  secret access key.)


Suggested change

(You will also probably want the `aws` cli tools, and to have run `aws configure` to set your access key ID and secret access key.)

(You will also probably want the `aws` cli tools, and to have run [`aws configure`](https://docs.aws.amazon.com/cli/latest/reference/configure/index.html) to set your access key ID and secret access key.)

Alternatively link to: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html

As I'm reading this, I am wondering about a local development section in the documentation, which almost certainly belongs in a different markdown file. This is a future problem.

That will need to come. But the hope is that most development won't require messing with terraform. Other AWS stuff will need to be added for the rest of the rep though, to get access to required buckets, etc.

jaclyn-taroni · 2020-08-17T19:23:03Z

aws/setup-log.md

+(You will also probably want the `aws` cli tools, and to have run `aws configure` to set your access key ID and  secret access key.)
+
+Once you have installed terraform, you can run `terraform init` from this directory to get started.
+To implement any changes, you will want a copy of the current state file, which is not included in this repository for security reasons.


Unclear to me how I would ensure that I would have a copy of the current state file, but it may be the case that it has to be this way.

Right now, you would message me, and I would send it to you. Kurt pointed me to terraform cloud, but I haven't looked into it yet.

jaclyn-taroni · 2020-08-17T19:23:54Z

aws/setup-log.md

+To implement any changes, you will want a copy of the current state file, which is not included in this repository for security reasons.
+With that file in the directory, you can test any changes with `terraform plan` and then implement them on AWS with `terraform apply`.
+
+Changes to the compute environments do not always seem to go smoothly, as Terraform does not always properly shut down the job queues.


kurtwheeler

LGTM! The readme seems nicely informative!

Add base terraform config for batch

a722d1e

jashapiro requested a review from kurtwheeler August 3, 2020 13:09

kurtwheeler reviewed Aug 3, 2020

View reviewed changes

aws/nextflow-compute.tf Outdated Show resolved Hide resolved

kurtwheeler reviewed Aug 3, 2020

View reviewed changes

aws/nextflow-compute.tf Outdated Show resolved Hide resolved

kurtwheeler reviewed Aug 3, 2020

View reviewed changes

kurtwheeler approved these changes Aug 3, 2020

View reviewed changes

update terraform names for consistency

f4fbc53

Add AMI comments

0f0f0ca

and fix typo

jashapiro and others added 3 commits August 4, 2020 09:58

Add tags where appropriate

c4fbb5a

Error fixes

6b02a36

Add ssh ingress, aws_key_pair, and sets the ec2_key_pair.

518a791

jashapiro and others added 11 commits August 5, 2020 07:33

rename batch role

cc3def8

fix keypair name

1833454

Add missing networking configurations.

93d0f79

Fix terraform warning.

5d58762

Ignore nextflow workdirs and logs

afb016c

Fix spot role

b3335f9

update tags/names

ddf638d

Temporarily increase spot bid percentage

f99c942

Remove ssh access to nodes

3b96bd5

Add nextflow config file

59c0309

Add initial comments to describe tf files

3d14d2f

jashapiro marked this pull request as ready for review August 17, 2020 18:34

jashapiro requested review from kurtwheeler and jaclyn-taroni August 17, 2020 18:39

jaclyn-taroni reviewed Aug 17, 2020

View reviewed changes

kurtwheeler approved these changes Aug 19, 2020

View reviewed changes

jashapiro merged commit f4ccc68 into master Aug 19, 2020

jashapiro mentioned this pull request Nov 2, 2020

Quantification comparison, PR2 of 2: Plot mapper comparisons #45

Merged

jashapiro deleted the jashapiro/terraform-batch branch November 5, 2020 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base terraform config for batch #2

Add base terraform config for batch #2

jashapiro commented Aug 3, 2020

kurtwheeler Aug 3, 2020

kurtwheeler Aug 3, 2020

jashapiro Aug 3, 2020

kurtwheeler Aug 3, 2020

kurtwheeler Aug 3, 2020

jashapiro Aug 3, 2020

kurtwheeler Aug 3, 2020

jashapiro Aug 3, 2020

kurtwheeler Aug 3, 2020

jashapiro Aug 3, 2020

kurtwheeler Aug 3, 2020

kurtwheeler Aug 3, 2020

jashapiro Aug 3, 2020

kurtwheeler Aug 3, 2020

kurtwheeler left a comment

jashapiro commented Aug 3, 2020

kurtwheeler commented Aug 3, 2020

kurtwheeler commented Aug 4, 2020

jashapiro commented Aug 17, 2020

jaclyn-taroni left a comment

jaclyn-taroni Aug 17, 2020

jaclyn-taroni Aug 17, 2020

jashapiro Aug 17, 2020

jaclyn-taroni Aug 17, 2020

jashapiro Aug 17, 2020 •

edited

Loading

jaclyn-taroni Aug 17, 2020

kurtwheeler left a comment

	This was launched with an 80GB EBS volume for data (compared to teh default 22) in addition to the base 8GB boot volume.
	This was launched with an 80GB EBS volume for data (compared to the default 22) in addition to the base 8GB boot volume.

	(You will also probably want the `aws` cli tools, and to have run `aws configure` to set your access key ID and secret access key.)
	(You will also probably want the `aws` cli tools, and to have run [`aws configure`](https://docs.aws.amazon.com/cli/latest/reference/configure/index.html) to set your access key ID and secret access key.)

Add base terraform config for batch #2

Add base terraform config for batch #2

Conversation

jashapiro commented Aug 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurtwheeler left a comment

Choose a reason for hiding this comment

jashapiro commented Aug 3, 2020

kurtwheeler commented Aug 3, 2020

kurtwheeler commented Aug 4, 2020

jashapiro commented Aug 17, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jashapiro Aug 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurtwheeler left a comment

Choose a reason for hiding this comment

jashapiro Aug 17, 2020 •

edited

Loading