Letter Frequency Count with Hadoop MapReduce

Welcome to the Letter Frequency Count application using Hadoop MapReduce! This project demonstrates how to count letter frequencies in a text using two different MapReduce techniques and perform analysis on text data in multiple languages. The project is organized into several components for clarity.

📁 Project Structure

Java MapReduce Code:
- LetterFrequencyMapReduce.java: Implements MapReduce functionality to count letter frequencies.
  - InMapperCombiner: Utilizes an in-mapper combiner to optimize performance.
  - Classic Combiner: Uses the classic combiner technique.
Python Analysis:
- performance_evaluation: Analyzes Hadoop application performance using various HDFS and YARN parameters.
- linguistic_analysis: Compares Latin text with texts in Romance languages to determine linguistic affinities and differences.

💻 Java MapReduce Code

`LetterFrequencyMapReduce.java`

This file contains two implementations of the MapReduce job for counting letter frequencies:

InMapperCombiner:
- Purpose: Reduces intermediate data during the map phase, optimizing performance.
- Usage: The mapper combines intermediate results within the map function.
Classic Combiner:
- Purpose: Aggregates intermediate results before the reduce phase.
- Usage: The combiner is used as a separate phase between map and reduce.

How It Works

Mapper: Processes input text to produce letter frequency counts.
Combiner: Aggregates results to reduce data volume.
Reducer: Finalizes the count and outputs the total letter frequencies.

📊 Python Analysis

`performance_evaluation`

This script evaluates the performance of the Hadoop MapReduce job based on various parameters:

HDFS Parameters: Analyzes the efficiency of Hadoop Distributed File System settings.
YARN Parameters: Assesses the impact of YARN settings on job performance.

Objective: Identify performance bottlenecks and optimize configuration settings.

`linguistic_analysis`

This script performs a comparative analysis of texts in Latin and Romance languages:

Text Analysis: Uses data from texts of the Aeneid in various Romance languages and Latin.
Statistical Comparison: Identifies similarities and differences, determining which Romance language is closest to Latin.

Objective: Provide insights into linguistic evolution and affinities over time.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
linguistic_analysis		linguistic_analysis
performance_evaluatiom		performance_evaluatiom
src		src
target		target
README.md		README.md
Report.pdf		Report.pdf
pom.xml		pom.xml
run_eneide.sh		run_eneide.sh
run_hadoop_job.sh		run_hadoop_job.sh
run_performance_red.sh		run_performance_red.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Letter Frequency Count with Hadoop MapReduce

📁 Project Structure

💻 Java MapReduce Code

`LetterFrequencyMapReduce.java`

How It Works

📊 Python Analysis

`performance_evaluation`

`linguistic_analysis`

About

Releases

Packages

Languages

nikisetti01/Hadoop-MapReduce-LetterFrequency-Analysis

Folders and files

Latest commit

History

Repository files navigation

Letter Frequency Count with Hadoop MapReduce

📁 Project Structure

💻 Java MapReduce Code

LetterFrequencyMapReduce.java

How It Works

📊 Python Analysis

performance_evaluation

linguistic_analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`LetterFrequencyMapReduce.java`

`performance_evaluation`

`linguistic_analysis`

Packages