updated readme with quick start instructions

dwatrous · Oct 1, 2015 · ae3245a · ae3245a
1 parent ea21f98
commit ae3245a
Showing 1 changed file with 19 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,22 @@
-# hadoop-multi-server-ansible
+# Hadoop multi-node cluster with Ansible
 Multi-server deployment of Hadoop using Ansible
 
-The hadoop installation anticipates that hadoop binary release is available in 
-roles/common/templates/hadoop-2.7.1.tar.gz
+This repository contains a set of Vagrant and Ansible scripts that make it fast and easy to build a fully functional Hadoop cluster, including HDFS, on a single computer using VirtualBox. In order to run the scripts as they are, you will probably need about 16GB RAM and at least 4 CPUs.
 
-This can be downloaded here:
-http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
+## Quick Start
+
+ - Clone this repository
+ - Download a binary release of hadoop (e.g. http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz) and saved it to `roles/common/templates/hadoop-2.7.1.tar.gz`
+ - Open a command prompt to the directory where you cloned the code
+ - Run `vagrant up`
+ - Use the commented lines in `bootstrap-master.sh` to do the following
+   - Run the ansible playbook: `ansible-playbook -i hosts-dev playbook.yml`
+   - Format the HDFS namenode
+   - Start DFS and YARN
+   - Run an example job: `hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 30`   
+
+## Additional Details and Explanation
+
+I wrote up a detailed article about how to understand and run these scripts. This includes the expected output and instructions to modify the process to accommodate proxy environments and low RAM environments. You can find that here:
+
+http://software.danielwatrous.com/install-and-configure-a-multi-node-hadoop-cluster-using-ansible/