Data Manipulation using Distributed Apache Hadoop's Map-Reduce Architecture

Synopsis

Hadoop based project that can consolidate and manipulate music contextual data.

Prerequisites

Virtual Box / Ubuntu Linux OS
JDK 1.7
JRE 1.7
Apache Hadoop 2.7.2

Before we start

Register yourself as one of the hadoop group user and proceed with the following commands from your terminal, Make sure hadoop is running before you proceed with the commands

Task0 - Merging the given files:

hadoop jar Table0.jar org.Table0 /input0 /output0

WARNING : input directory which has all three files(name sensitive) and output directory for Task0

Output format : <artist_name><artist_id>.....<artist_id>............

Task1 - Consolidation of locations, artists and songs:

hadoop jar Table1.jar org.Table1 /output0 /output1

WARNING : output directory from Task0 and output directory for Task1

Output format : <artist_name1><artist_name2>................

Task2 - Analysis on the dataset :

hadoop jar Table2.jar org.Table2 /output1 /output0 /output2 Task1O/P Task0 O/P Task2O/P

Output format : <artist_id>.....<artist_id>............

Execute the above commands in a sequence and you get the output in the specified formats.

Assumptions Considered :

There can be multiple locations, artistids and songtitles available, so included them all while I was merging in Task0

Task2 has three command line arguments, i.e. args[0] will be Task1's output, args[1] will be Task0's output and args[2] will be the result Task2's output.

Location & Artist Names are captured by Regular expression.

Note : If you want to change the input files (Short files of your own ? ) then you may have to change the code or overwrite the files already there under input0/

References:

http://stackoverflow.com/questions/2072222/regular-expression-for-positive-and-a-negative-decimal-value-in-java

http://regexr.com/

Boiler-plate code from https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Copyright © 2015 Vaikunth Sridharan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Manipulation using Distributed Apache Hadoop's Map-Reduce Architecture

Synopsis

Prerequisites

Before we start

Task0 - Merging the given files:

WARNING : input directory which has all three files(name sensitive) and output directory for Task0

Task1 - Consolidation of locations, artists and songs:

WARNING : output directory from Task0 and output directory for Task1

Task2 - Analysis on the dataset :

Execute the above commands in a sequence and you get the output in the specified formats.

Assumptions Considered :

Note : If you want to change the input files (Short files of your own ? ) then you may have to change the code or overwrite the files already there under input0/

References:

Copyright © 2015 Vaikunth Sridharan

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
HadoopAssignment3		HadoopAssignment3
input0		input0
README.md		README.md
Table0.jar		Table0.jar
Table1.jar		Table1.jar
Table2.jar		Table2.jar

vaikzs/distributed-computing-hadoop

Folders and files

Latest commit

History

Repository files navigation

Data Manipulation using Distributed Apache Hadoop's Map-Reduce Architecture

Synopsis

Prerequisites

Before we start

Task0 - Merging the given files:

WARNING : input directory which has all three files(name sensitive) and output directory for Task0

Task1 - Consolidation of locations, artists and songs:

WARNING : output directory from Task0 and output directory for Task1

Task2 - Analysis on the dataset :

Execute the above commands in a sequence and you get the output in the specified formats.

Assumptions Considered :

Note : If you want to change the input files (Short files of your own ? ) then you may have to change the code or overwrite the files already there under input0/

References:

Copyright © 2015 Vaikunth Sridharan

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages