workshops-spark_intro

Introductory workshops for beginners in Apache Spark with Python (pyspark) and SQL (Spark SQL). Repository includes IPYNB notebooks and data.

Note: file paths in notebooks will require updating

I - Intro

Covers some core concepts using Spark for data analysis including:

Demonstrates the concept of "Tidy Data" using example code in Apache Spark and tidying five common types of untidy data:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md