Skip to content

ms-shankar/life-of-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Life of PySpark

GitHub license

A Tale of Two Environments

alt text

An objective comparison of running Spark on Scala vs. Python in both development and production environments.

Index

Introduction

Presentation

  • Clone this repository

Run the following commands to:

  • navigate the presentation directory
  • install dependencies
  • run the presentation

Commands

$ cd presentation/reveal.js
$ npm install
$ npm start

Demo

Run the following commands to setup a Jupyter Notebook running on a Spark cluster

Commands

  • SETUP JUPYTER NOTEBOOK WITH PYSPARK

-- Install Jupyter notebook using PyPI pip

$ pip install jupyter

-- Make Pyspark available to Jupyter

$ pip install findspark

-- Configure Notebook password

$ jupyter-notebook password

-- Launch Notebook

$ jupyter-notebook --notebook-dir /path/to/a/local/directory

About

Life of PySpark - A Tale of Two Environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published