A Tale of Two Environments
An objective comparison of running Spark on Scala vs. Python in both development and production environments.
- Clone this repository
Run the following commands to:
- navigate the presentation directory
- install dependencies
- run the presentation
$ cd presentation/reveal.js
$ npm install
$ npm start
Run the following commands to setup a Jupyter Notebook running on a Spark cluster
- SETUP JUPYTER NOTEBOOK WITH PYSPARK
-- Install Jupyter notebook using PyPI pip
$ pip install jupyter
-- Make Pyspark available to Jupyter
$ pip install findspark
-- Configure Notebook password
$ jupyter-notebook password
-- Launch Notebook
$ jupyter-notebook --notebook-dir /path/to/a/local/directory