Skip to content

Amazon EC2 for beginners

Matt Dowle edited this page Jan 7, 2015 · 18 revisions

This explains in a minimal way how to start and use a spot instance on Amazon EC2. There are several good blog articles but I found they didn't cover some aspects in great detail. This is a wiki page so you can easily update and improve it as time passes (please do by pressing Edit in the top right).

r3-xlarge: 30GB RAM, 4 cores, $0.03/hour
r3-8xlarge: 244GB RAM, 32 cores, $0.25/hour

Spot instances are cheap because, should you be outbid by someone else or spare capacity be reallocated, they can be killed at any moment by Amazon with no notice. However, in my experience (so far) that rarely happens. A spot instance is ideal for large data benchmarking and research jobs; i.e. tasks that can simply be restarted should they be killed.

What are you waiting for?

  1. Create an account (including your credit card details): http://aws.amazon.com/ec2/

  2. Get to the EC2 Management Console and bookmark it in your browser.

  3. Click Spot Requests in the left hand menu. Your screen should now look like this : TO ADD

  4. Resist temptation to click the blue "Request Spot Instances" button but click the Pricing History grey button at the top instead.

  5. Change instance type in the drop down at the top to the one you want; e.g. r3-8xlarge. You have to use another source to know how much RAM and how many cores each instance name corresponds to; e.g. http://www.ec2instances.info/. Observe history and current price. If this isn't acceptable, close price history and change the region in the drop down box in the black area at the top right of the Management Console. Then click price history again. Keep changing regions/type until you find a region/type where the price is acceptable. Each region/type combination is priced separately.

  6. Now click the blue "Request Spot Instances" button. Note that this isn't the same as the "Launch instance" button in the Instances view (although that is where we'll view the spot instance in a moment).

Step 1: (Choose an Amazon Machine Image) The Quick Start machine images are selected by default. Choose the Ubuntu one. Currently it's the 4th one down: Ubuntu Server 14.04 LTS (HVM), SSD Volume Type, 64bit. This is a brand new, blank and factory fresh Linux server. Simple. No dependencies. No software or libraries pre-installed that might be out of date.

Step 2: (Choose an Instance Type) Choose r3-8xlarge (256GB RAM and 32 cores).

Step 3: (Configure Instance Details) The maximum bid price is the only one to complete. This is the maximum you're prepared to pay per hour. Start with the current spot price from point 5 above and with knowledge of the history add some margin; e.g., if the spot price is $0.25 then I tend to bid $0.50. Should you be outbid you have no opportunity to increase your bid ... your instance will just be killed instantly.

Step 4: (Add Storage) Next

Step 5: (Tag Instance) Next

Step 6: (Configure Security Group) SSH (port 22) is already open by default. It's important to add HTTP (80) and HTTPS (443) otherwise R can't download packages. Optional: In the security group name field, change "launch-wizard-1" to "R Server", then next time you can just choose "Existing security group" instead.

  1. Click Review and launch

  2. Click Launch

  3. (Select an existing key pair or create a new key pair) Select "Create a new key pair". The "Key Pair name" field is just the name of the file that will be created on your local machine. A different file is needed for each Amazon region it seems. So I have "/mdowle.pem" for N.California, "/mdowleOregon.pem" etc. Enter the file name (without the .pem extension) into the field and click the "Download Key Pair" button and save it somewhere within easy reach (I save them in my home directory ~). Next time you can just "choose an existing key pair" and it will find the appropriate .pem file for that Amazon region.

  4. Click "Request Spot Instance" blue button.

Your request will now appear as a new line in the "Spot Requests" view. After at most a minute the status will change to "fulfilled" and you can change to the "Instances" view and you have a new line there as well. Your instance is now running and you are being charged per hour whether it is idle or not. Ensure you don't forget to kill any running instances when you're finished otherwise you'll get a surprise when the monthly bill arrives in your inbox. There are no time limits or warnings about running instances you may have forgotten to terminate.

  1. Select the instance (if not already selected) and click the blue Connect button at the top. This doesn't really connect, it just displays a window showing you how to connect.

  2. You only need to copy one line from this window, for example :

ssh -i mdowle.pem ubuntu@54.67.82.235
  1. Paste this into a shell (I paste it into my editor's shell). Either do this in the directory where you saved the .pem to or include the path to the .pem file. That's why I put the .pem files in ~ to make this easy since the shell opens in the home directory. Obviously the IP address will be different for you. Enter "yes" to accept authentication.

  2. You now have a prompt to a factory fresh large-memory machine. Type free -h. Type lscpu. :-)

  3. I have the following startup script in my editor which I run by pressing F5.

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
sudo add-apt-repository 'deb  http://cran.stat.ucla.edu/bin/linux/ubuntu trusty/'
sudo apt-get update
sudo apt-get -y install r-base-core
sudo apt-get -y install libcurl4-openssl-dev    # for RCurl which devtools depends on
sudo apt-get -y install htop                    # to monitor RAM and CPU
R
options(repos = "http://cran.stat.ucla.edu")
install.packages("devtools")
require(devtools)
# Use R as normal ...
  1. Start another shell, paste in the same ssh to connect and type htop. Leave this running to monitor RAM and CPU usage on the remote instance.

  2. Type df -h and observe disk size is not large. However you have 256GB of RAM. Use ram risk by writing and reading to /dev/shm, plus that'll be very fast disk access. Even if you use 100GB of ram disk, you'll still have 140GB of RAM. Any results you want to keep, transfer them from the server to your local machine.

  3. To transfer files to and from the server :

# To copy to EC2 (final colon needed):
scp -i ~/mdowle.pem localFile.csv ubuntu@54.183.161.72:

# To copy from EC2 (final space then dot is needed):
scp -i ~/mdowle.pem ubuntu@54.183.161.72:~/remoteFile.csv .

Once you're used to it, you can get to this point in about 2 minutes.