Name		Name	Last commit message	Last commit date
parent directory ..
robosat_pink @ 5fcff67		robosat_pink @ 5fcff67
README.md		README.md

README.md

ML Model Design

This document details the design for a machine learning training and evaluation pipeline to successfully fulfill a ground truth-based satellite imagery snow cover identification task.

Goal: produce a model which can identify snow in 3m PlanetScope satellite imagery with sufficient accuracy for operational use.

Previous Work

Our previous approach used a Gaussian Process approach to create a pixel-based snow classifier. The results were mixed. We're hoping to expand based on developments in image segmentation networks.

Current Design

The framework we're using for this task is an "image segmentation" modeling framework, which relies on three elements:

Images of known, consistent sizes
Binary masks corresponding to the desired segmentation for each image, with matching sizes
A network architecture (with or without pre-trained weights?) for image segmentation

Input Data

Data	Data Type	Description
PlanetScope Image Tiles	4-band rasters	Using the `../preprocess` tools we've created a set of 4-band TIFF image tiles which are stored in OSM/XYZ tile format and represent imagery relevant to the known ground-truth.
Ground Truth Tiles	Binary rasters	Again using `../preprocess` toosl we've created a set of 1-band binary TIFF image tiles (in OSM/XYZ) which represent our segmentation masks. Note: These tiles cover an extent at least as large as the image tiles. The ground-truth data can come from a variety of sources, including: 1) the Airborne Snow Observatory (ASO) and 2) SnowEX

Model Architectures

TBD, but

Training

We need to figure out how to structure this training task, because the data are strange. Though we have input data as above, it's really stored on disk in slippy-map tile directories for each Planet image. The options for incoporating these data into a training pipeline are as follows:

Concatenate all tiles from all images into a /image-tiles folder.
- Pros: training code / data loader is easy and straightforward.
- Cons: lose geospatial information in the form of XYZ directory structure, which makes serving results harder? Need to assign unique IDs to each image.
Write a data loader which does this concatenation given a list of folders which themselves are slippy map directories.
- Pros: means we can keep the data stored in the current way (which is produced in the preprocess step)
- Cons: would be pretty precise + brittle to implement this and to ensure overlap with the correct mask tile
Train with checkpoints: train model incrementally using only a single image (e.g. a single XYZ directory) for each training step and using model checkpoints to continue training.
- Pros: allows for maintainance of the current directory structure and for easy train-test split (if we split on images!)
- Cons: there's got to be some sort of training bias lurking in here. Difficult train-test split (if we split on tiles!)

I've decided that the best first-pass at this is to just concatenate all images together into an /image-tiles folder, which has several implementation advantages. The loss of geoaptial information isn't terribly important (especially when the .tif files themselves are GeoTiffs.)

Operational Use Cases

Ecology: studying changes in alpine phenology (Janneke HilleRisLambers)
Ecology: abiotic environmental variable for species distribution modeling
Avalanche Forecasting: early season snow extent for basal weak layer prediction (Deems)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

README.md

ML Model Design

Previous Work

Current Design

Input Data

Model Architectures

Training

Operational Use Cases

Files

model

Directory actions

More options

Directory actions

More options

Latest commit

History

model

Folders and files

parent directory

README.md

ML Model Design

Previous Work

Current Design

Input Data

Model Architectures

Training

Operational Use Cases