Titanic-Data-Analysis-Using-Map-Reduce
Passenger ID| Status (survived=0 & died=1)| Passenger class| Name| Sex| Age| Dispatch| Parch| Ticket| Amount| CabinNumber| EmbarkNumber
Input file is Comma separated file.
Total Number of columns in the Dataset is 12 and minimum 6 columns should be present to be considered for the analysis
- Input Key to the Mapper would be LongWritable(file offset) and Input Value would be text (row of file)
- Output key of the Mapper would be Gender(Text Format) and output value would be Average age (IntWritable)
- Converting entire Mapper Input Value into string and splitting with delimiter as "Comma"
- If length of total number of columns is greater than 6 and if person died , then Gender is sent as Key and Age as Value
Calculating the sum of ages for each entry and dividing it by total number of entries
Total Number of columns in the Dataset is 12 and minimum 6 columns should be present to be considered for the analysis
- Input Key to the Mapper would be LongWritable(file offset) and Input Value would be text (row of file)
- Output key of the Mapper would be Gender and class together as a Composite Key (Text Format) and output value would be 1 (IntWritable)
- Converting entire Mapper Input Value into string and splitting with delimiter as "Comma"
- If length of total number of columns is greater than 6 and if person survived, counting it as one entry by sending "Gender + class" as key and 1 as value
Input Key : Gender + Class| Input Value : 1 | Output Key : Gender + Class | Output Value : Count of survived |
Calculating the sum of entries for each unique class and Gender and returning the count of Total survived people.