-
Notifications
You must be signed in to change notification settings - Fork 33
4. Training a Model
The following instructions explain how to train and evaluate a region specific double headed LSTM crop/non-crop model. Once trained the model can be used to generate a cropland mask for a region of interest.
Prerequisite: Adding labeled data for region of interest
An ROI bounding box is necessary to show the model which region to focus on during training. Specifically, the bounding box makes training data points within the ROI (local points) to be weighted more than data points outside the ROI.
1a. Draw the bounding box in Google Earth Engine: script
1b. Paste the generated BBox
string into src/bboxes
(example) and commit the change to Github
Navigate to the GitHub train action click the Run Workflow
button.
Specify the required arguments:
-
Model name
: name of the model following the convention -
Evaluation dataset(s)
: name of the dataset(s) (indatasets.py
) which contain(s) evaluation points -
Bounding box name
: name of BBox specified in step 1.
Common Optional model args
include:
-
--skip_era5
: Trains model without the use of ERA5 precipitation and temperature data -
--start_month November
: Trains model using a November-November crop growing season (if not specific February-February is used)
Other arguments can be found in train.py
Once the arguments are specified, click Run workflow
to being model training.
Alternative: Training using GitHub Command Line
- To train a model from the master branch and create a PR with new model:
gh workflow run train.yml -f MODEL_NAME=...
- To train a model from an existing branch and push new model to branch:
gh workflow run train.yml --ref branch-name -f MODEL_NAME=...
Logs of the Train Github Action can be viewed by clicking the link to the run and then the square labeled train
, this will show the current status of the run.
Once at the Train model
step, live model training and validation curves can be viewed on Weights and Biases: https://wandb.ai/nasa-harvest/crop-mask
Once the Training Run is complete an automatic Pull Request will be opened with model metrics and wandb logs in data/models.json
.
Hannah Kerner, Gabriel Tseng, Inbal Becker-Reshef, Catherine Nakalembe, Brian Barker, Blake Munshell, Madhava Paliyam, and Mehdi Hosseini. 2020. Rapid Response Crop Maps in Data Sparse Regions. KDD ’20: ACMSIGKDD Conference on Knowledge Discovery and Data Mining Workshops, August 22–27, 2020, San Diego, CA. Link