Letter Frequency Count with Hadoop MapReduce

Welcome to the Letter Frequency Count application using Hadoop MapReduce! This project demonstrates how to count letter frequencies in a text using two different MapReduce techniques and perform analysis on text data in multiple languages. The project is organized into several components for clarity.

📁 Project Structure

Java MapReduce Code:
- LetterFrequencyMapReduce.java: Implements MapReduce functionality to count letter frequencies.
  - InMapperCombiner: Utilizes an in-mapper combiner to optimize performance.
  - Classic Combiner: Uses the classic combiner technique.
Python Analysis:
- performance_evaluation: Analyzes Hadoop application performance using various HDFS and YARN parameters.
- linguistic_analysis: Compares Latin text with texts in Romance languages to determine linguistic affinities and differences.

💻 Java MapReduce Code

`LetterFrequencyMapReduce.java`

This file contains two implementations of the MapReduce job for counting letter frequencies:

InMapperCombiner:
- Purpose: Reduces intermediate data during the map phase, optimizing performance.
- Usage: The mapper combines intermediate results within the map function.
Classic Combiner:
- Purpose: Aggregates intermediate results before the reduce phase.
- Usage: The combiner is used as a separate phase between map and reduce.

How It Works

Mapper: Processes input text to produce letter frequency counts.
Combiner: Aggregates results to reduce data volume.
Reducer: Finalizes the count and outputs the total letter frequencies.

📊 Python Analysis

`performance_evaluation`

This script evaluates the performance of the Hadoop MapReduce job based on various parameters:

HDFS Parameters: Analyzes the efficiency of Hadoop Distributed File System settings.
YARN Parameters: Assesses the impact of YARN settings on job performance.

Objective: Identify performance bottlenecks and optimize configuration settings.

`linguistic_analysis`

This script performs a comparative analysis of texts in Latin and Romance languages:

Text Analysis: Uses data from texts of the Aeneid in various Romance languages and Latin.
Statistical Comparison: Identifies similarities and differences, determining which Romance language is closest to Latin.

Objective: Provide insights into linguistic evolution and affinities over time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Letter Frequency Count with Hadoop MapReduce

📁 Project Structure

💻 Java MapReduce Code

`LetterFrequencyMapReduce.java`

How It Works

📊 Python Analysis

`performance_evaluation`

`linguistic_analysis`

Files

README.md

Latest commit

History

README.md

File metadata and controls

Letter Frequency Count with Hadoop MapReduce

📁 Project Structure

💻 Java MapReduce Code

LetterFrequencyMapReduce.java

How It Works

📊 Python Analysis

performance_evaluation

linguistic_analysis

`LetterFrequencyMapReduce.java`

`performance_evaluation`

`linguistic_analysis`