Skip to content

eODP/data-processing

Repository files navigation

eODP Data Processing

Scripts to process eODP data.

Setup

We are using Python 3.6.8 on the production server.

For the dev environment, we are using pyenv to run Python 3.6.8, pip to manage packages, and pyenv-virtualenv to manage virtual environments. We are also using JupyterLab to process the raw data.

(1) Install pyenv, pyenv-virtualenv, jupyterlab

(2) Install Python 3.6.8

pyenv install 3.6.8

set this directory to use Python 3.6.8

pyenv local 3.6.8

(3) Start virtual environment

create virtual environment

pyenv virtualenv 3.6.8 <venv-name>

set this directory to use <venv-name> virtual environment

pyenv local <venv-name>

(4) Install packages.

pip install -r requirements.txt

(5) Config .env file

Copy .env-example and rename it .env.

Fill in the missing environmental variables.

(6) Install additional packages

pip install <package>

pip freeze > requirements.txt

Run Scripts

Start JupyterLab to run data processing scripts. We are using JupyterLab instead of plain Jupyter notebooks because JupyterLab lets you easily browse the data files while working.

cd notebooks
jupyter lab

Testing

Run tests

pytest

Run linter (flake8) and code formatter (Black).

python scripts/linter.py

Deploy

We are using rsync to sync the files to the live server

./deploy.sh user@live.server.host