Adapted from an invited lecture presented in Dr. Marques' Introduction to Data Science class - Fall 2020, Answering Questions with Data, bridging the gap between technical analysis and stakeholders' point-of-view with Jupyter notebooks.
- How to write well-structured, understandable, resilient, flexible Jupyter notebooks
- How to present the results of our investigations to the people who asked the questions, the stakeholders
We start with a Jupyter notebook that produces the right result but lacks good structure and proper coding practices and transform it into a good notebook.
What is a good notebook?
- The overall organization is logical
- Important assumptions and decisions are spelled out
- Code is easy to understand
- Code is flexible (easy to modify)
- Code is resilient (hard to break)
We will transform the original notebook into a good one, step by step. Each step addresses a set of related items.
- Step 1: the original notebook, the one that lacks structure and proper coding practices.
- Step 2: add a description, organize into sections, add exploratory data analysis.
- Step 3: make data clean-up more explicit, and explain why certain numbers were chosen (the assumptions behind them).
- Step 4: make the code more flexible with constants, and make the code more difficult to break (more resilient).
- Step 5: make the graphs easier to read.
- Step 6: describe the limitations of the conclusion.
Reworked sections are marked with this note:
The presentation used in the class is on this file.
This blog post is a written, simplified version of the presentation.
- Clone this repository
- cd <folder for the cloned repository>
- Create a Python environment:
python3 -m venv env
- Activate the environment:
source env/bin/activate
(Mac and Linux), orenv\Scripts\activate.bat
(Windows) - Update pip:
python -m pip install --upgrade pip
- Install dependencies (only once):
pip install -r requirements.txt
- Run the notebooks:
jupyter lab