Skip to content

siel-iiith/hadoop-cookbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hadoop Cookbook

Installs and Configures Hadoop on Ubuntu with hadoop installed(using official .deb package).

Requirements

This cookbook is part of HadoopStack. In order to skip the time required in installation of Hadoop on Instances, we decided to use an image with Hadoop pre-installed. This cookbook currently supports Ubuntu with Hadoop pre-installed from official .deb package.

http://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/

Attributes

hadoop::default

Key Type Description Default
['hadoop']['mapred_user'] String User on behalf of whom job/tasktracker daemons will run mapred
['hadoop']['hdfs_user'] String User on behalf of whom name/datanodes daemons will run hdfs
['hadoop']['group'] String A common system group for hadoop daemons hadoop
['hadoop']['jobtracker'] String IP of jobtracker
['hadoop']['namenode'] String IP of namenode
['hdfs_replication'] Integer Replication Factor 2
['hadoop']['dfs_dir'] String Parent directory of Namenode/Datanode dir /mnt/dfs
['hadoop']['namenode_dir'] String Namenode Directory /mnt/dfs/nn
['hadoop']['datanode_dir'] String Datanode Directory /mnt/dfs/dn
['hadoop']['mapred_local_dir'] String Mapred local directory /mnt/mapred/local
['hadoop']['mapred_system_dir'] String Mapred system directory /mnt/mapred/system
['hadoop']['log_dir'] String Log directory for Hadoop daemons /mnt/log/hadoop
['hadoop']['pid_dir'] String PID directory for Hadoop Daemons /var/run/hadoop
['hadoop']['role'] String Hadoop Role for the Instance

Usage

Create roles for appropriate services - jobtracker, tasktracker, namenode and datanode. Update the run_list and set at least two attributes - ['hadoop']['namenode'] and ['hadoop']['jobtracker'].

If its traditional HDFS

name "jobtracker"
description "Role to initiate jobtracker"
run_list [
    "recipe[hadoop::default]"
    ]
default_attributes("hadoop" => {
    "jobtracker" => <jobtracker_ip>,
    "namenode" => <namenode_ip>,
    "role" => "jobtracker"
})

If you are using S3 as storage backend.

name "tasktracker"
description "Role to initiate tasktracker"
run_list [
    "recipe[hadoop::default]"
    ]
default_attributes("hadoop" => {
    "jobtracker" => <jobtracker_ip>,
    "namenode" => <namenode_ip>,
    "role" => "tasktracker"
    "dfs" => {
        "uri" => "s3://"
    }
    "s3" => {
        "bucket" => <bucket_name>
  }
})

hadoop::default

The default recipe creates configuration files

  • core-site.xml
  • mapred-site.xml
  • hdfs-site.xml

in /etc/hadoop directory using erb templates available in templates/.

hadoop::prepare

This recipe is included in default and is used to create and set appropriate permissions for hadoop directories.

hadoop::jobtracker

This recipe enables and starts jobtracker service.

hadoop::tasktracker

This recipe enables and starts tasktracker service.

Contributing

  1. Fork the repository on Github
  2. Create a named feature branch (like add_component_x)
  3. Write you change
  4. Test it thoroughly
  5. Submit a Pull Request using Github

License and Authors

Authors: Shashank Sahni shredder12@gmail.com

About

Hadoop cookbook used by HadoopStack project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages