Skip to content

alan-turing-institute/clean-air-infrastructure

Repository files navigation

UrbanAir API - Including London COVID-19 Busyness (Odysseus)

Build Status Build Status License: MIT

Azure Infrastructure for the Clean Air project.

Provides 48h high-resolution air pollution forecasts over London via the UrbanAir-API.

Previously repurposed to assess busyness in London during the COVID-19 pandemic - providing busyness data via the ip-whitelisted API.

Contents

See the old README file!

Contributors đź‘Ż

A list of key developers on the project. A good place to start if you wish to contribute.

Name GitHub ID Email Admin
James Brandreth @jamesbrandreth jbrandreth@turing.ac.uk
Oscar Giles @OscartGiles ogiles@turing.ac.uk Infrastructure, Prod Database, Kubernetes Cluster
Oliver Hamelijnck @defaultobject ohamelijnck@turing.ac.uk
Chance Haycock @chancehaycock chaycock@turing.ac.uk
Christy Nakou @ChristyNou cnakou@turing.ac.uk
Patrick O'Hara @PatrickOHara pohara@turing.ac.uk
Harry Moss @harryjmoss h.moss@ucl.ac.uk
David Perez-Suarez @dpshelio d.perez-suarez@ucl.ac.uk
James Robinson @jemrobinson jrobinson@turing.ac.uk Infrastructure, Prod Database, Kubernetes Cluster
Tim Spain @timspainUCL t.spain@ucl.ac.uk
Edward Thorpe-Woods @TeddyTW ethorpe-woods@turing.ac.uk

Contents

Setting up a development environment

Accessing Production database

Entry points

Running the UrbanAir API

Processing Scoot Data

Method

Developer guide

Researcher guide

Infrastructure


Contributing guide

Azure account

To contribute to the Turing deployment of this project you will need to be on the Turing Institute's Azure active directory. In other words you will need a turing email address <someone>@turing.ac.uk. If you do not have one already contact an infrastructure administrator.

If you are deploying the CleanAir infrastrucure elsewhere you should have access to an Azure account (the cloud-computing platform where the infrastructure is deployed).

Non-infrastructure dependencies

To contribute as a non-infrastructure developer you will need the following:

  • Azure command line interface (CLI) (for managing your Azure subscriptions)
  • Docker (For building and testing images locally)
  • postgreSQL (command-line tool for interacting with db)
  • python (Note that python>=3.8 is currently incompatible with some of our dependencies. We currently recommend python==3.7.8)
  • CleanAir python packages (install python packages)
  • GDAL (For inserting static datasets)
  • eccodes (For reading GRIB files)

The instructions below are to install the dependencies system-wide, however you can follow the instructions at the end if you wish to use an anaconda environment if you want to keep it all separated from your system.

Windows is not supported. However, you may use Windows Subsystem for Linux 2 and then install dependencies with conda.

Azure CLI

If you have not already installed the command line interface for Azure, please follow the procedure here to get started

Or follow a simpler option Install it using on your own preferred environment with `pip install azure-cli`

Docker

Download and install Docker Desktop

PostgreSQL

PostgreSQL and PostGIS.

Setting up a local Postgres intance with PostGIS can be troublesome, so we recommend using a docker image.

docker run --name database -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 cleanairdocker.azurecr.io/database
If you aren't logged in with access to the cleanairdocker registry, you can build the image yourself and run it with:
docker build -t database:latest -f ./containers/dockerfiles/test_database.dockerfile .
docker run --name database -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 database

! If you have another Postgres install running, it will likely be using port 5432. In this case, use a different port number, for example to 5000 (Remember to change your local secrets file to match). Run instead with:

docker run --name database -e POSTGRES_HOST_AUTH_METHOD=trust -p 5000:5432 database
Alternatively, you can install Postgres with your package manager, such as Homebrew:
brew install postgresql postgis

GDAL

GDAL can be installed using brew on OSX.

brew install gdal

or any of the binaries provided for different platforms.

Eccodes

brew install eccodes

Development tools

The following are optional as we can run everything on docker images. However, they are recommended for development/testing and required for setting up a local copy of the database.

pip install -r containers/requirements.txt

CleanAir Python packages

To run the CleanAir functionality locally (without a docker image) you can install the package with pip.

For a basic install which will allow you to set up a local database run:

pip install -e 'containers/cleanair[<optional-dependencies>]'

Certain functionality requires optional dependencies. These can be installed by adding the following:

Option keyword Functionality
models CleanAir GPFlow models
traffic FBProphet Trafic Models

For getting started we recommend:

pip install -e 'containers/cleanair[models, traffic]'

UATraffic (London Busyness only)

All additional functionality related to the London Busyness project requires:

pip install -e 'containers/odysseus'

UrbanAir Flask API package

pip install -e 'containers/urbanair'

Infrastructure dependencies

Cloud infrastructure developers will require the following in addition to the non-infrastructure dependencies.

Infrastructure development

  • Access to the deployment Azure subscription
  • Terraform (for configuring the Azure infrastructure)
  • Travis Continuous Integration (CI) CLI (for setting up automatic deployments)

Azure subscription

You need to have access to the CleanAir Azure subscription to deploy infrastructure. If you need access contact an infrastructure administrator

Terraform

The Azure infrastructure is managed with Terraform. To get started download Terraform from their website. If using Mac OS, you can instead use homebrew:

brew install terraform

Travis CI CLI

Ensure you have Ruby 1.9.3 or above installed:

brew install ruby
gem update --system

Then install the Travis CI CLI with:

gem install travis -no-rdoc -no-ri

On some versions of OSX, this fails, so you may need the following alternative:

ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future gem install --user-install travis -v 1.8.13 --no-document

Verify with

travis version

If this fails ensure Gems user_dir is on the path:

cat << EOF >> ~/.bash_profile
export PATH="\$PATH:$(ruby -e 'puts Gem.user_dir')/bin"
EOF

Using a Conda environment

It is possible to set everything up with a conda environment, this way you can keep different versions of software around in your machine. All the steps above can be done with:

# Non-infrastructure dependencies
conda create -n busyness python=3.7.8 --channel conda-forge 
conda activate busyness
conda install -c anaconda postgresql
conda install -c conda-forge gdal postgis uwsgi
pip install azure-cli
pip install azure-nspkg azure-mgmt-nspkg
# The following fails with: ERROR: azure-cli 2.6.0 has requirement azure-storage-blob<2.0.0,>=1.3.1, but you'll have azure-storage-blob 12.3.0 which is incompatible.
# but they install fine.
pip install -r containers/requirements.txt
pip install -e 'containers/cleanair[models,traffic]'
pip install -e 'containers/odysseus'
pip install -e 'containers/urbanair'

## Infrastructure dependencies

# if you don't get rb-ffi and rb-json you'll need to install gcc_linux-64 and libgcc to build these in order to install travis.
conda install -c conda-forge terraform ruby rb-ffi rb-json
# At least on Linux you'll need to dissable IPV6 to make this version of gem to work.
gem install travis -no-rdoc -no-ri
# Create a soft link of the executables installed by gem into a place seen within the conda env.
conda_env=$(conda info --json | grep -w "active_prefix" | awk '{print $2}'| sed -e 's/,//' -e 's/"//g')
ln -s $(find $conda_env -iname 'travis' | grep bin) $conda_env/bin/

Login to Azure

To start working with Azure, you must first login to your account from the terminal:

az login

Infrastructure developers:

Infrastructure developers should additionally check which Azure subscriptions you have access to by running

az account list --output table --refresh

Then set your default subscription to the Clean Air project (if you cannot see it in the output generated from the last line you do not have access):

az account set --subscription "CleanAir"

If you don't have access this is ok. You only need it to deploy and manage infrastructure.

Configure a local database

In production we use a managed PostgreSQL database. However, it is useful to have a local copy to run tests and for development. To set up a local version start a local postgres server:

brew services start postgresql
If you installed the database using conda

Set it up the server and users first with:

initdb -D mylocal_db
pg_ctl -D mylocal_db -l logfile start
createdb --owner=${USER} myinner_db

When you want to work in this environment again you'll need to run:

pg_ctl -D mylocal_db -l logfile start

You can stop it with:

pg_ctl -D mylocal_db stop

Create a local secrets file

We store database credentials in json files. For production databases you should never store database passwords in these files - for more information see the production database section.

mkdir -p .secrets
echo '{
    "username": "postgres",
    "password": "''",
    "host": "localhost",
    "port": 5432,
    "db_name": "cleanair_test_db",
    "ssl_mode": "prefer"
}' >> .secrets/.db_secrets_offline.json

N.B In some cases your default username may be your OS user. Change the username in the file above if this is the case.

createdb cleanair_test_db

Create schema and roles

We must now setup the database schema. This also creates a number of roles on the database.

Create a variable with the location of your secrets file and set as an environment variable

export DB_SECRET_FILE=$(pwd)/.secrets/.db_secrets_offline.json
python containers/entrypoints/setup/configure_db_roles.py -s $DB_SECRET_FILE -c configuration/database_role_config/local_database_config.yaml

Static data insert

The database requires a number of static datasets. We can now insert static data into our local database. You will need a SAS token to access static data files stored on Azure.

If you have access Azure you can log in to Azure from the command line and run the following to obtain a SAS token:

SAS_TOKEN=$(python containers/entrypoints/setup/insert_static_datasets.py generate)

By default the SAS token will last for 1 hour. If you need a longer expiry time pass --days and --hours arguments to the program above. N.B. It's better to use short expiry dates where possible.

Otherwise you must request a SAS token from an infrastructure developer and set it as a variable:

SAS_TOKEN=<SAS_TOKEN>

You can then download and insert all static data into the database by running the following:

python containers/entrypoints/setup/insert_static_datasets.py insert -t $SAS_TOKEN -s $DB_SECRET_FILE -d rectgrid_100 street_canyon hexgrid london_boundary oshighway_roadlink scoot_detector urban_village

If you would also like to add UKMAP to the database run:

python containers/entrypoints/setup/insert_static_datasets.py insert -t $SAS_TOKEN -s $DB_SECRET_FILE -d ukmap

UKMAP is extremly large and will take ~1h to download and insert. We therefore do not run tests against UKMAP at the moment.

N.B. SAS tokens will expire after a short length of time, after which you will need to request a new one.

Check the database configuration

You can check everything configured correctly by running:

pytest containers/tests/test_database_init --secretfile $DB_SECRET_FILE

Access CleanAir Production Database

To access the production database you will need an Azure account and be given access by one of the database adminstrators. You should discuss what your access requirements are (e.g. do you need write access).To access the database first login to Azure from the terminal.

If you do not have an Azure subscription you must use:

az login --allow-no-subscriptions

You can then request an access token. The token will be valid for between 5 minutes and 1 hour. Set the token as an environment variable:

export PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)

Connect using psql

Once your IP has been whitelisted (ask the database adminstrators), you will be able to access the database using psql:

psql "host=cleanair-inputs-2021-server.postgres.database.azure.com port=5432 dbname=cleanair_inputs_db user=<your-turing-credentials>@cleanair-inputs-2021-server sslmode=require"

replacing <your-turing-credentials> with your turing credentials (e.g. jblogs@turing.ac.uk).

Create secret file to connect using CleanAir package

To connect to the database using the CleanAir package you will need to create another secret file:

echo '{
    "username": "<your-turing-credentials>@cleanair-inputs-2021-server",
    "host": "cleanair-inputs-2021-server.postgres.database.azure.com",
    "port": 5432,
    "db_name": "cleanair_inputs_db",
    "ssl_mode": "require"
}' >> .secrets/db_secrets_ad.json

Make sure you then replace <your-turing-credentials> with your full Turing username (e.g.jblogs@turing.ac.uk@cleanair-inputs-2021-server).

Running entry points

The directory containers/entrypoints contains Python scripts which are then built into Docker images in containers/dockerfiles. You can run them locally.

These are scripts which collect and insert data into the database. To see what arguments they take you can call any of the files with the argument -h, for example:

python containers/entrypoints/inputs/input_laqn_readings.py -h

Entry point with local database

The entrypoints will need to connect to a database. To do so you can pass one or more of the following arguments:

  1. --secretfile: Full path to one of the secret .json files you created in the .secrets directory.

  2. --secret-dict: A set of parameters to override the values in --secretfile. For example you could alter the port and ssl parameters as --secret-dict port=5411 ssl_mode=prefer

Entry point with production database

You will notice that the db_secrets_ad.json file we created does not contain a password. To run an entrypoint against a production database you must run:

az login
export PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)

When you run an entrypoint script the CleanAir package will read the PGPASSWORD environment variable. This will also take precedence over any value provided in the--secret-dict argument.

Docker entry point

To run an entry point from a docker file we first need to build a docker image. Here shown for the satellite input entry point:

docker build -t input_satellite:local -f containers/dockerfiles/input_satellite_readings.Dockerfile containers

To run we need to set a few more environment variables. The first is the directory with secret files in:

SECRET_DIR=$(pwd)/.secrets

Now get a new token:

export PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)

Finally you can run the docker image, passing PGPASSWORD as an environment variable (:warning: this writes data into the online database)

docker run -e PGPASSWORD -v $SECRET_DIR:/secrets input_satellite:local -s 'db_secrets_ad.json' -k <copernicus-key>

Here we also provided the copernicus api key which is stored in the cleanair-secrets Azure's keyvault.

If you want to run that example with the local database you can do so by:

COPERNICUS_KEY=$(az keyvault secret show --vault-name cleanair-secrets --name satellite-copernicus-key -o tsv --query value)
# OSX or Windows: change "localhost" to host.docker.internal on your db_secrets_offline.json
docker run -e PGPASSWORD -v $SECRET_DIR:/secrets input_satellite:local -s 'db_secrets_offline.json' -k $COPERNICUS_KEY
# Linux:
docker run --network host -e PGPASSWORD -v $SECRET_DIR:/secrets input_satellite:local -s 'db_secrets_offline.json' -k $COPERNICUS_KEY

UrbanAir API

The UrbanAir RESTFUL API is a Fast API application. To run it in locally you must configure the following steps:

Configure CleanAir database secrets

Ensure you have configured a secrets file for the CleanAir database

export PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)

Run the application

On development server

DB_SECRET_FILE=$(pwd)/.secrets/.db_secrets_ad.json uvicorn urbanair.urbanair:app --reload

In a docker image

To build the API docker image

docker build -t fastapi:test -f containers/dockerfiles/urbanairapi.Dockerfile 'containers'

Then run the docker image:

DB_SECRET_FILE='.db_secrets_ad.json'
SECRET_DIR=$(pwd)/.secrets
docker run -i -p 80:80 -e DB_SECRET_FILE -e PGPASSWORD -e APP_MODULE="urbanair.urbanair:app" -v $SECRET_DIR:/secrets fastapi:test

Developer guide

Style guide

Writing Documentation

Before being accepted into master all code should have well writen documentation.

Please use Google Style Python Docstrings

We would like to move towards adding type hints so you may optionally add types to your code. In which case you do not need to include types in your google style docstrings.

Adding and updating existing documentation is highly encouraged.

Gitmoji

We like gitmoji for an emoji guide to our commit messages. You might consider (entirely optional) using the gitmoji-cli as a hook when writing commit messages.

Working on an issue

The general workflow for contributing to the project is to first choose and issue (or create one) to work on and assign yourself to the issues.

You can find issues that need work on by searching by the Needs assignment label. If you decide to move onto something else or wonder what you've got yourself into please unassign yourself, leave a comment about why you dropped the issue (e.g. got bored, blocked by something etc) and re-add the Needs assignment label.

You are encouraged to open a pull request earlier rather than later (either a draft pull request or add WIP to the title) so others know what you are working on.

How you label branches is optional, but we encourage using iss_<issue-number>_<description_of_issue> where <issue-number> is the github issue number and <description_of_issue> is a very short description of the issue. For example iss_928_add_api_docs.

Running tests

Tests should be written where possible before code is accepted into master. Contributing tests to existing code is highly desirable. Tests will also be run on travis (see the travis configuration).

All tests can be found in the containers/tests/ directory. We already ran some tests to check our local database was set up.

To run the full test suite against the local database run

export DB_SECRET_FILE=$(pwd)/.secrets/.db_secrets_offline.json
pytest containers --secretfile $DB_SECRET_FILE

Writing tests

The following shows an example test:

def test_scoot_reading_empty(secretfile, connection):
    conn = DBWriter(
        secretfile=secretfile, initialise_tables=True, connection=connection
    )

    with conn.dbcnxn.open_session() as session:
        assert session.query(ScootReading).count() == 0

It uses the DBWriter class to connect to the database. In general when interacting with a database we write a class which inherits from either DBWriter or DBReader. Both classes take a secretfile as an argument which provides database connection secrets.

Critically, we also pass a special connection fixture when initialising any class that interacts with the database.

This fixture ensures that all interactions with the database take place within a transaction. At the end of the test the transaction is rolled back leaving the database in the same state it was in before the test was run, even if commit is called on the database.

Researcher guide

The following steps provide useful tools for researchers to use, for example setting up jupyter notebooks and running models using a GPU.

Setup notebook

First install jupyter with conda (you can also use pip).

pip install jupyter

You can start the notebook:

jupyter notebook

Alternatively you may wish to use jupyter lab which offers more features on top of the normal notebooks.

jupyter lab

This will require some additional steps for adding jupyter lab extensions for plotly.

For some notebooks you may also want to a mapbox for visualising spatial data. To do this you will need a mapbox access token which you can store inside your .env file (see below).

Environment variables

To access the database, the notebooks need access to the PGPASSWORD environment variable. It is also recommended to set the DB_SECRET_FILE variable. We will create a .env file within you notebook directory path/to/notebook where you will be storing environment variables.

Note: if you are using a shared system or scientific cluster, do not follow these steps and do not store your password in a file.

Run the below command to create a .env file, replacing path/to/secretfile with the path to your db_secrets.

echo '
DB_SECRET_FILE="path/to/secretfile"
PGPASSWORD=
' > path/to/notebook/.env

To set the PGPASSWORD, run the following command. This will create a new password using the azure cli and replace the line in .env that contains PGPASSWORD with the new password. Remember to replace path/to/notebook with the path to your notebook directory.

sed -i '' "s/.*PGPASSWORD.*/PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)/g" path/to/notebook/.env

If you need to store other environment variables and access them in your notebook, simply add them to the .env file.

To access the environment variables, include the following lines at the top of your jupyter notebook:

%load_ext dotenv
%dotenv

You can now access the value of these variables as follows:

secretfile = os.getenv("DB_SECRET_FILE", None)

Remember that the PGPASSWORD token will only be valid for ~1h.

Training models

To train a model on your local machine you can run a model fitting entrypoint:

TL;DR

urbanair init production
urbanair model data generate-config --train-source laqn --train-source satellite --pred-source laqn
urbanair model data generate-full-config
urbanair model data download --training-data --prediction-data
urbanair model setup mrdgp
urbanair model fit mrdgp
urbanair model update result mrdgp
urbanair model update metrics INSTANCE_ID

Generate a model config

urbanair model data generate-config --train-source laqn --train-source satellite --pred-source satellite --pred-source laqn --pred-source hexgrid

Validate the config

urbanair model data generate-full-config

Download all data

urbanair model data download --training-data --prediction-data

Export data to directory

urbanair model data save-cache <data-dir-name>

Run model

urbanair model svgp fit <data-directory>

or for deep gp

urbanair model deep-gp fit <data-directory>

Model fitting with docker

Build a model fitting docker image with tensorflow installed:

docker build --build-arg git_hash=$(git show -s --format=%H) -t cleanairdocker.azurecr.io/model_fitting -f containers/dockerfiles/model_fitting.Dockerfile containers

Alternatively you can pull the docker image if you haven't made any changes:

docker pull cleanairdocker.azurecr.io/model_fitting

To fit and predict using the SVGP you can run:

docker run -it --rm cleanairdocker.azurecr.io/model_fitting:latest sh /app/scripts/svgp_static.sh

To fit and predict using the MRDGP run:

docker run -it --rm cleanairdocker.azurecr.io/model_fitting:latest sh /app/scripts/mrdgp_static.sh

If you are running on your local machine you will also need to add -e PGPASSWORD -e DB_SECRET_FILE -v $SECRET_DIR:/secrets after the run command and set the environment variables (see above in the README).

Singularity for HPC

Many scientific clusters will give you access to Singularity. This software means you can import and run Docker images without having Docker installed or being a superuser. Scientific clusters are often a pain to setup, so we strongly recommend using Singularity & Docker to avoid a painful experience.

First login to your HPC and ensure singularity is installed:

singularity --version

Now we will need to pull the Docker image from our Docker container registry on Azure. Since our docker images are private you will need to login to the container registry.

  1. Go to portal.azure.com.
  2. Search for the CleanAirDocker container registry.
  3. Go to Access keys.
  4. The username is CleanAirDocker. Copy the password.
singularity pull --docker-login docker://cleanairdocker.azurecr.io/mf:latest

Then build the singularity image to a .sif file. We recommend you store all of your singularity images in a directory called containers.

singularity build --docker-login containers/model_fitting.sif docker://cleanairdocker.azurecr.io/mf:latest

To test everything has built correctly, spawn a shell and run python:

singularity shell containers/model_fitting.sif
python

Then try importing tensorflow and cleanair:

import tensorflow as tf
tf.__version__
import cleanair
cleanair.__version__

Finally your can run the singularity image, passing any arguments you see fit:

singularity run containers/model_fitting.sif --secretfile $SECRETS

Infrastructure Deployment

đź’€ The following steps are needed to setup the Clean Air cloud infrastructure. Only infrastrucure administrator should deploy

Login to Travis CLI

Login to Travis with your github credentials, making sure you are in the Clean Air repository (Travis automatically detects your repository):

travis login --pro

Create an Azure service principal using the documentation for the Azure CLI or with Powershell, ensuring that you keep track of the NAME, ID and PASSWORD/SECRET for the service principal, as these will be needed later.

Setup Terraform with Python

Terraform uses a backend to keep track of the infrastructure state. We keep the backend in Azure storage so that everyone has a synchronised version of the state.

You can download the `tfstate` file with `az` though you won't need it.
cd terraform
az storage blob download -c terraformbackend -f terraform.tfstate -n terraform.tfstate --account-name terraformstorage924roouq --auth-mode key

To enable this, we have to create an initial Terraform configuration by running (from the root directory):

python cleanair_setup/initialise_terraform.py -i $AWS_KEY_ID -k $AWS_KEY -n $SERVICE_PRINCIPAL_NAME -s $SERVICE_PRINCIPAL_ID -p $SERVICE_PRINCIPAL_PASSWORD

Where AWS_KEY_ID and AWS_KEY are the secure key information needed to access TfL's SCOOT data on Amazon Web Services.

AWS_KEY=$(az keyvault secret show --vault-name terraform-configuration --name scoot-aws-key -o tsv --query value)
AWS_KEY_ID=$(az keyvault secret show --vault-name terraform-configuration --name scoot-aws-key-id -o tsv --query value)

The SERVICE_PRINCIPAL_NAME, _ID and _PASSWORD are also available in the terraform-configuration keyvault.

SERVICE_PRINCIPAL_NAME=$(az keyvault secret show --vault-name terraform-configuration --name azure-service-principal-name -o tsv --query value)
SERVICE_PRINCIPAL_ID=$(az keyvault secret show --vault-name terraform-configuration --name azure-service-principal-id -o tsv --query value)
SERVICE_PRINCIPAL_PASSWORD=$(az keyvault secret show --vault-name terraform-configuration --name azure-service-principal-password -o tsv --query value)

This will only need to be run once (by anyone), but it's not a problem if you run it multiple times.

Building the Clean Air infrastructure with Terraform

To build the Terraform infrastructure go to the terraform directory

cd terraform

and run:

terraform init

If you want to, you can look at the backend_config.tf file, which should contain various details of your Azure subscription. NB. It is important that this file is in .gitignore. Do not push this file to the remote repository

Then run:

terraform plan

which creates an execution plan. Check this matches your expectations. If you are happy then run:

terraform apply

to set up the Clean Air infrastructure on Azure using Terraform. You should be able to see this on the Azure portal.

Creating A Record for cleanair API (DO THIS BEFORE RUNNING AZURE PIPELINES)

Terraform created a DNS Zone in the kubernetes cluster resource group (RG_CLEANAIR_KUBERNETES_CLUSTER). Navigate to the DNS Zone on the Azure portal and copy the four nameservers in the “NS” record. Send the nameserver to Turing IT Services. Ask them to add the subdomain’s DNS record as an NS record for urbanair in the turing.ac.uk DNS zone record.

  1. When viewing the DNS zone on the Azure Portal, click + Record set
  2. In the Name field, enter urbanair.
  3. Set Alias record set to “Yes” and this will bring up some new options.
  4. We can now set up Azure pipelines. Once the cleanair api has been deployed on kubernetes you can update the alias record to point to the ip address of the cleanair-api on the cluster.

Initialising the input databases

Terraform will now have created a number of databases. We need to add the datasets to the database. This is done using Docker images from the Azure container registry. You will need the username, password and server name for the Azure container registry. All of these will be stored as secrets in the RG_CLEANAIR_INFRASTRUCTURE > cleanair-secrets Azure KeyVault.

Setting up Azure pipelines

These Docker images are built by an Azure pipeline whenever commits are made to the master branch of the GitHub repository. Ensure that you have configured Azure pipelines to use this GitHub repository. You will need to add Service Connections to GitHub and to Azure (the Azure one should be called cleanair-scn). Currently a pipeline is set up here.

To run the next steps we need to ensure that this pipeline runs a build in order to add the Docker images to the Azure container registry created by Terraform. Either push to the GitHub repository, or rerun the last build by going to the Azure pipeline page and clicking Run pipeline on the right-hand context menu. This will build all of the Docker images and add them to the registry.

Now go to Azure and update the A-record to point to the ip address of the cleanair-api on the cluster.

Adding static datasets

To add static datasets follow the Static data insert instructions but use the production database credentials

Adding live datasets

The live datasets (like LAQN or AQE) are populated using regular jobs that create an Azure container instance and add the most recent data to the database. These are run automatically through Kubernetes and the Azure pipeline above is used to keep track of which version of the code to use.

Kubernetes deployment with GPU support

The azure pipeline will deploy the cleanair helm chart to the azure kubernetes cluster we deployed with terraform. If you deployed GPU enabled machines on Azure (current default in the terraform script) then you need to install the nvidia device plugin daemonset. The manifest for this is adapted from the Azure docs. However, as our GPU machines have taints applied we have to add tolerations to the manifest, otherwise the nodes will block the daemonset. To install the custom manifest run,

kubectl apply -f kubernetes/gpu_resources/nvidia-device-plugin-ds.yaml

Removing Terraform infrastructure

To destroy all the resources created by Terraform run:

terraform destroy

You can check everything was removed on the Azure portal. Then login to TravisCI and delete the Azure Container repo environment variables.