Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
data		data
howto		howto
images		images
tutorial		tutorial
LICENSE.md		LICENSE.md
README.md		README.md

Repository files navigation

PySpark Tutorial

PySpark is the Python API for Spark.
The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.
PySpark supports two types of Data Abstractions:
- RDDs
- DataFrames
PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.
PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)

Glossary: big data, MapReduce, Spark

Basics of PySpark with Examples

PySpark Examples and Tutorials

Books

Data Algorithms with Spark

Data Algorithms

PySpark Algorithms

Miscellaneous

Download, Install Spark and Run PySpark

How to Minimize the Verbosity of Spark

PySpark Tutorial and References...

Questions/Comments

View Mahmoud Parsian's profile on LinkedIn
Please send me an email: mahmoud.parsian@yahoo.com
Twitter: @mahmoudparsian

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark

Data Algorithms with Spark

PySpark Algorithms

Data Algorithms