Skip to content
Angel Pizarro edited this page Sep 27, 2012 · 10 revisions

RUM on AWS Using Starcluster#

1. Install Starcluster

You need to install Starcluster on your local machine. Please follow the directions from their installation guide to install StarCluster on your workstation.

2. Set Up StarCluster

Follow the StarCluster Quick Tutorial to start your first cluster. Please note that you will need to enable experimental features to use the get and put StarCluster commands.

# in the global options, set to experimental to "True"
ENABLE_EXPERIMENTAL=True

3. Make an EBS Volume with pre-installed RUM tools and indexes

We provide an EBS snapshot (snap-1f73906b) of RUM version 2.0.2_07 along with the pre-compiled RUM genome indexes. You can create a volume from this snapshot in two ways:

3.1.a Via EC2 API tools

If you have EC2 Tools installed, you can use that to careate the volume. Again, please pay attention to the region you will be requesting in the StarCluster config.

$ ec2-create-volume --private-key pk-XXXX.pem --cert cert-XXXX.pem --region us-east-1a --snapshot snap-1f73906b --size 100

3.1.b Using a Python script

Here we will use the Python boto library to create a volume pre-populated with RUM v2.0.2_07 and all of the indexes.

First make sure that boto is installed.

# you may need to use sudo to do this
pip install boto

Then use the following script to create a new EBS volume from the public RUM snapshot. Pay particular attention to the availability zone to make sure it matches StarCluster's configured zone. The script also assumes that you have exported your AWS credentials to your shell:

export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The script is located as a gist on github, but contents are shown below:

#!/usr/bin/env python
import boto
from os import environ

conn = boto.connect_ec2(environ["AWS_ACCESS_KEY_ID"],environ["AWS_SECRET_ACCESS_KEY"])
size = 100
#adjust to the same zone as StarCluster config
zone = "us-east-1a"
# Description: RUMv2.0.2_7
snapshot = "snap-1f73906b"
v = conn.create_volume(size,zone,snapshot)
print v

Save this script to your filesystem and run it (in this case we named it create_rum_ebs.py:

python create_rum_ebs.py

3.2 Finishing up

Now add the newly created volume to your StarCluster configuration file.

#############################
## Configuring EBS Volumes ##
#############################
[vol rum]
volume_id = vol-a1b2c3d4
mount_path = /rum

4. Create EBS volumes to store input reads and RUM results

Next, we create and format the volumes for storing input reads and results. RUM needs lots of scratch space. It is recommended that you allow for 1TB of space per Illumina flowcell analysis. We will use StarCluster for this task:

starcluster createvolume --name=ngsdata -d -m "mkfs.ext4" 1024 us-east-1a
starcluster createvolume --name=rumresults 1024 us-east-1a

NOTE: the --name option is for tagging the volume on AWS. If you are booting multiple clusters, it is good practice to prefix that name with the cluster's ID. E.g. if I have two clusters called "rum" and "physics", then the above would be changed to:

starcluster createvolume --name=rum-ngsdata 1024 us-east-1a
starcluster createvolume --name=rum-rumresults 1024 us-east-1a

Now add those volumes to your configuration:

[vol rumresults]
volume_id = vol-a123bcd5
mount_path = /rumresults

[vol ngsdata]
volume_id = vol-a123bcd6
mount_path = /ngsdata

Last but not least, add all of the volumes to your cluster definition. In the following, we created a cluster template named rum-small:

[cluster rum-small]
# other cluster options ...
VOLUMES = ngstools, rumresults, ngsdata

4. Starting Up the Cluster

Finally we are able to fire up starcluster. In this example, we have defined cluster profile called rum-small with two execution nodes.

$ starcluster start rum-small

To use spot pricing, you can either set the SPOT_BID parameter in the config, or use the --bid option

# set a max spot bid price of $1.50
$ starcluster start --bid 1.50 rum-small

5. Connect to Master Node

Once StarCluster reportst that the cluster is started,you should be able to ssh to the master:

$ starcluster sshmaster rum-small

6. Optional: Configure RUM Run on Master Node

If you mounted the ngstools volume under a different mount point, you will need to edit the RUM configuration files as as appropriate to define absolute paths to indexes and executables.

7. Optional: Testing RUM on your shiny new cluster