A Machine Learning Pipeline for Climate Science

This repository is an end-to-end pipeline for the creation, intercomparison and evaluation of machine learning methods in climate science.

The pipeline carries out a number of tasks to create a unified-data format for training and testing machine learning methods.

These tasks are split into the different classes defined in the src folder and explained further below:

NOTE: some basic working knowledge of Python is required to use this pipeline, although it is not too onerous

Using the Pipeline

There are three entrypoints to the pipeline:

A blog post describing the goals and design of the pipeline can be found here.

View the initial presentation of our pipeline here.

Setup

Anaconda running python 3.7 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from this directory) run

conda env create -f environment.yml

This will create an environment named esowc-drought with all the necessary packages to run the code. To activate this environment, run

conda activate esowc-drought

Docker can also be used to run this code. To do this, first run the docker app (either docker desktop) or configure the docker-machine:

# on macOS
brew install docker-machine docker

docker-machine create --driver virtualbox default
docker-machine env default

See here for help on all machines or here for MacOS.

Then build the docker image:

docker build -t ml_drought .

Then, use it to run a container, mounting the data folder to the container:

docker run -it \
--mount type=bind,source=<PATH_TO_DATA>,target=/ml_drought/data \
ml_drought /bin/bash

You will also need to create a .cdsapirc file with the following information:

url: https://cds.climate.copernicus.eu/api/v2
key: <INSERT KEY HERE>
verify: 1

Testing

This pipeline can be tested by running pytest. flake8 is used for linting.

We use mypy for type checking. This can be run by running mypy src (this runs mypy on the src directory).

We use black for code formatting.

Team: @tommylees112, @gabrieltseng

For updates follow @tommylees112 on twitter or look out for our blog posts!

Acknowledgements

This was a project completed as part of the ECMWF Summer of Weather Code Challenge #12. The challenge was setup to use ECMWF/Copernicus open datasets to evaluate machine learning techniques for the prediction of droughts.

Huge thanks to @ECMWF for making this project possible!

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
data		data
environments		environments
img		img
notebooks		notebooks
pipeline_config		pipeline_config
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
README.md		README.md
environment.yml		environment.yml
mypy.ini		mypy.ini
pytest.ini		pytest.ini
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Machine Learning Pipeline for Climate Science

Using the Pipeline

Setup

Testing

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

ECMWFCode4Earth/ml_drought

Folders and files

Latest commit

History

Repository files navigation

A Machine Learning Pipeline for Climate Science

Using the Pipeline

Setup

Testing

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages