Skip to content

Latest commit

 

History

History
75 lines (51 loc) · 1.3 KB

README.md

File metadata and controls

75 lines (51 loc) · 1.3 KB

Life of PySpark

GitHub license

A Tale of Two Environments

alt text

An objective comparison of running Spark on Scala vs. Python in both development and production environments.

Index

Introduction

Presentation

  • Clone this repository

Run the following commands to:

  • navigate the presentation directory
  • install dependencies
  • run the presentation

Commands

$ cd presentation/reveal.js
$ npm install
$ npm start

Demo

Run the following commands to setup a Jupyter Notebook running on a Spark cluster

Commands

  • SETUP JUPYTER NOTEBOOK WITH PYSPARK

-- Install Jupyter notebook using PyPI pip

$ pip install jupyter

-- Make Pyspark available to Jupyter

$ pip install findspark

-- Configure Notebook password

$ jupyter-notebook password

-- Launch Notebook

$ jupyter-notebook --notebook-dir /path/to/a/local/directory