Apache Spark - From installation to performing awesome operations in Apache Spark Stack
-
Updated
May 8, 2017 - Python
Apache Spark - From installation to performing awesome operations in Apache Spark Stack
Group 10 Project, Fall 2020, CS 6240: Large-Scale Parallel Data Processing, Khoury College of Computer Sciences, Northeastern University
Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).
Frontend for a distributed electronic health records system
Developing a Flow with EMR and Airflow
A powerful CLI tool and API for managing Spark jobs on Amazon EMR clusters.
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
Scripts for provisioning data science tools
The EMR Helper library tries to help when setting up and managing an EMR cluster.
Add a description, image, and links to the emr topic page so that developers can more easily learn about it.
To associate your repository with the emr topic, visit your repo's landing page and select "manage topics."