Skip to content

Commit

Permalink
ICESat-2 ML tutorial - photon classification on ATL07 sea ice data (#17)
Browse files Browse the repository at this point in the history
* Initial jupytext notebook with outline and ATL07 to geopandas script

First draft with a rough layout of sections for the ICESat-2 ML photon classification tutorial. Included learning objectives, and some initial code to read ATL07 sea ice data from HDF5 to a geopandas.GeoDataFrame. Deciding to do a reimplementation of the Koo et al., 2023 paper with code at https://github.com/YoungHyunKoo/IS2_ML.

* Save ATL07 photon data to GeoParquet file with ZSTD compression

Show how to save geopandas.GeoDataFrame to a GeoParquet file, and load it back again. Also put down some notes about compression codecs.

* Writeup sub-section on moving data from CPU to GPU

Some quick code to convert the geopandas.GeoDataFrame to a torch.Tensor and put it in a torch DataLoader. Showing how to move data from CPU to GPU using the `.to` method. Might modify this section's title/subtitle later depending on how the code goes.

* Add ICESat-2 photon classification tutorial to index table

One more entry in the tutorial index page. Putting down Machine Learning and Pytorch as the topics, and ATL07 as the dataset used for now.

* Add pytorch (cpu build) to conda environment

* Refactor to get ATL07 using earthaccess instead of icepyx

Less boilerplate s3fs code to manage, and not using icepyx means this should run on the Pangeo pytorch-notebook docker image too!

* Search for Sentinel-2 imagery captured at same time as ATL07 track

Looking for a coincident alignment of two satellites (ICESat-2 and Sentinel-2) capturing data at the same time! Managed to find a coincident capture on 2019-02-24, though haven't checked if the spatial extent matches yet. Can improve the search algorithm later by expanding the search time window (+/- X minutes) and using a more exact bounding box search in the STAC API query.

* Second exact spatial intersection search using ATL07 line track

Temporal match wasn't enough, so adding the spatial match as well. Metadata on ICESat-2 was lacking unfortunately, so need to open the ATL07 HDF5 file to get the xy coordinates and build a linestring from it to pass to the STAC query. Managed to find a lucky coincident match on 2019-10-31, and have verified that the crossover is valid.

* Add more columns to GeoDataFrame and filter out cloudy points

Add the `x_atc`, `layer_flag` and `height_segment_ssh_flag` data variables to the GeoDataFrame which will be useful for plotting/filtering later. Using `height_segment_ssh_flag` to remove points that might be affected by clouds.

* Plot ATL07 tracks on top of Sentinel-2 image

Get the Sentinel-2 RGB image, reproject the ATL07 points and subset to the image's bounding box, then plot them both using PyGMT! The plot colors sea ice points as blue, and sea surface (water) points as orange.

* Label surface type of ATL07 points using Sentinel-2 Red band pixel value

Use PyGMT's grdtrack to get the Sentinel-2 Red band's pixel values sampled at every ATL07 xy point, and then apply a simple threshold to classify into water (dark), thin ice (gray) and thick ice (white).

* Rename Part 2 to DataLoader and Model architecture

Reorganizing some content so Part 2 is focused on preparing the DataLoader and neural network model architecture. Have now moved the dataloader for-loop to Part 3 'Training' and commented out the to CUDA parts. Also calculated "hist_mean_median_h_diff" column which is the actual variable we want to use in training.

* Architect PhotonClassificationModel and writeup ML model choices

Writing up section about choosing a machine learning algorithm, including ML models with different levels of complexity from decision trees to neural networks and state-of-the-art models. Also implemented a simple multi-layer perceptron model based on the description in Koo et al., 2023's paper (but without the tanh activation).

* Construct main training loop for ML model

Finally got to the actual neural network model training! Now properly splitting the mini-batch data into input and target tensors, passing the input into the model to get the prediction, and minimizing the loss between prediction and target. Needed to do some ugly dtype casting to prevent `RuntimeError`s. Trying to keep this fairly basic without train/validation splits, and only ran this for 3 epochs. Have shifted some markdown blocks up where they belong too.

* Add instructions to install pytorch in first code cell

Default CryoCloud docker image won't have Pytorch, so will need to install it at the first step.

* Save geoparquet schema version 1.1.0 and reword note on zstd compression

Default CryoCloud image now has Geopandas 1.x, so can save to a non-beta version of GeoParquet schema now.

* Add overview flowchart to top of notebook and minor edits

Adding an overview diagram of the ATL07 + Sentinel-2 processing pipeline (illustrated using Excalidraw) to the start of the notebook. Made some minor edits to some of the markdown cells to include more references and explanatory text.

* Pre-render Jupyter notebook and move files to machine-learning folder

Pushing the photon_classifier Jupyter Notebook with pre-rendered cells that was ran on CryoCloud. Putting the files under a 'machine-learning' folder, to be consistent with the other tutorials using subfolders.
  • Loading branch information
weiji14 authored Aug 18, 2024
1 parent 3958f02 commit 2aa88b8
Show file tree
Hide file tree
Showing 10 changed files with 15,628 additions and 438 deletions.
1 change: 1 addition & 0 deletions book/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ execute:
- "**/geospatial-advanced.ipynb"
- "cloud-computing/04-cloud-optimized-icesat2.ipynb"
- "cloud-computing/atl08_parquet_files/atl08_parquet.ipynb"
- "machine-learning/photon_classifier.ipynb"
allow_errors: false
# Per-cell notebook execution limit (seconds)
timeout: 300
Expand Down
2 changes: 1 addition & 1 deletion book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ parts:
- file: tutorials/cloud-computing/atl08_parquet_files/atl08_parquet
options:
- titlesonly: true
- file: tutorials/machine-learning/photon_classifier.ipynb
- caption: Projects
chapters:
- file: projects/index
Expand All @@ -47,4 +48,3 @@ parts:
- file: reference/bibliography
- file: reference/IS2-resources
- file: reference/questions

1 change: 1 addition & 0 deletions book/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Below you'll find a table keeping track of all tutorials presented at this event
| [ICESat-2 Mission](./mission-overview/icesat-2-mission-overview.ipynb) | ICESat-2 Mission and Products | n/a | Not recorded |
| [Cloud Computing](./cloud-computing/00-goals-and-outline.ipynb) | Cloud Computing Tutorial | n/a | Not recorded |
| [Notebooks to Packages](./nb-to-package/index.md) | All about Python classes to packages | n/a | Not recorded |
| [ICESat-2 photon classification](./machine-learning/photon_classifier.ipynb) | Machine Learning, PyTorch | ATL07 | Not recorded |
14,286 changes: 14,286 additions & 0 deletions book/tutorials/machine-learning/photon_classifier.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 2aa88b8

Please sign in to comment.