Skip to content
#

apache-hadoop-framework

Here are 5 public repositories matching this topic...

Language: All
Filter by language

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

  • Updated Aug 17, 2024
  • Python

Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for co…

  • Updated Oct 7, 2021
  • Java

Improve this page

Add a description, image, and links to the apache-hadoop-framework topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop-framework topic, visit your repo's landing page and select "manage topics."

Learn more