-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICESat-2 ML tutorial - photon classification on ATL07 sea ice data #17
Commits on Aug 6, 2024
-
Initial jupytext notebook with outline and ATL07 to geopandas script
First draft with a rough layout of sections for the ICESat-2 ML photon classification tutorial. Included learning objectives, and some initial code to read ATL07 sea ice data from HDF5 to a geopandas.GeoDataFrame. Deciding to do a reimplementation of the Koo et al., 2023 paper with code at https://github.com/YoungHyunKoo/IS2_ML.
Configuration menu - View commit details
-
Copy full SHA for c8b1690 - Browse repository at this point
Copy the full SHA c8b1690View commit details
Commits on Aug 7, 2024
-
Save ATL07 photon data to GeoParquet file with ZSTD compression
Show how to save geopandas.GeoDataFrame to a GeoParquet file, and load it back again. Also put down some notes about compression codecs.
Configuration menu - View commit details
-
Copy full SHA for 0c892ed - Browse repository at this point
Copy the full SHA 0c892edView commit details -
Writeup sub-section on moving data from CPU to GPU
Some quick code to convert the geopandas.GeoDataFrame to a torch.Tensor and put it in a torch DataLoader. Showing how to move data from CPU to GPU using the `.to` method. Might modify this section's title/subtitle later depending on how the code goes.
Configuration menu - View commit details
-
Copy full SHA for 18336b9 - Browse repository at this point
Copy the full SHA 18336b9View commit details -
Add ICESat-2 photon classification tutorial to index table
One more entry in the tutorial index page. Putting down Machine Learning and Pytorch as the topics, and ATL07 as the dataset used for now.
Configuration menu - View commit details
-
Copy full SHA for 5db2d77 - Browse repository at this point
Copy the full SHA 5db2d77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 561529c - Browse repository at this point
Copy the full SHA 561529cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6836bcd - Browse repository at this point
Copy the full SHA 6836bcdView commit details
Commits on Aug 8, 2024
-
Refactor to get ATL07 using earthaccess instead of icepyx
Less boilerplate s3fs code to manage, and not using icepyx means this should run on the Pangeo pytorch-notebook docker image too!
Configuration menu - View commit details
-
Copy full SHA for d616d77 - Browse repository at this point
Copy the full SHA d616d77View commit details
Commits on Aug 10, 2024
-
Search for Sentinel-2 imagery captured at same time as ATL07 track
Looking for a coincident alignment of two satellites (ICESat-2 and Sentinel-2) capturing data at the same time! Managed to find a coincident capture on 2019-02-24, though haven't checked if the spatial extent matches yet. Can improve the search algorithm later by expanding the search time window (+/- X minutes) and using a more exact bounding box search in the STAC API query.
Configuration menu - View commit details
-
Copy full SHA for 1858273 - Browse repository at this point
Copy the full SHA 1858273View commit details
Commits on Aug 12, 2024
-
Second exact spatial intersection search using ATL07 line track
Temporal match wasn't enough, so adding the spatial match as well. Metadata on ICESat-2 was lacking unfortunately, so need to open the ATL07 HDF5 file to get the xy coordinates and build a linestring from it to pass to the STAC query. Managed to find a lucky coincident match on 2019-10-31, and have verified that the crossover is valid.
Configuration menu - View commit details
-
Copy full SHA for f445e25 - Browse repository at this point
Copy the full SHA f445e25View commit details -
Add more columns to GeoDataFrame and filter out cloudy points
Add the `x_atc`, `layer_flag` and `height_segment_ssh_flag` data variables to the GeoDataFrame which will be useful for plotting/filtering later. Using `height_segment_ssh_flag` to remove points that might be affected by clouds.
Configuration menu - View commit details
-
Copy full SHA for fce139d - Browse repository at this point
Copy the full SHA fce139dView commit details -
Plot ATL07 tracks on top of Sentinel-2 image
Get the Sentinel-2 RGB image, reproject the ATL07 points and subset to the image's bounding box, then plot them both using PyGMT! The plot colors sea ice points as blue, and sea surface (water) points as orange.
Configuration menu - View commit details
-
Copy full SHA for 770bb83 - Browse repository at this point
Copy the full SHA 770bb83View commit details -
Label surface type of ATL07 points using Sentinel-2 Red band pixel value
Use PyGMT's grdtrack to get the Sentinel-2 Red band's pixel values sampled at every ATL07 xy point, and then apply a simple threshold to classify into water (dark), thin ice (gray) and thick ice (white).
Configuration menu - View commit details
-
Copy full SHA for 1a11529 - Browse repository at this point
Copy the full SHA 1a11529View commit details
Commits on Aug 13, 2024
-
Rename Part 2 to DataLoader and Model architecture
Reorganizing some content so Part 2 is focused on preparing the DataLoader and neural network model architecture. Have now moved the dataloader for-loop to Part 3 'Training' and commented out the to CUDA parts. Also calculated "hist_mean_median_h_diff" column which is the actual variable we want to use in training.
Configuration menu - View commit details
-
Copy full SHA for 87cf493 - Browse repository at this point
Copy the full SHA 87cf493View commit details -
Architect PhotonClassificationModel and writeup ML model choices
Writing up section about choosing a machine learning algorithm, including ML models with different levels of complexity from decision trees to neural networks and state-of-the-art models. Also implemented a simple multi-layer perceptron model based on the description in Koo et al., 2023's paper (but without the tanh activation).
Configuration menu - View commit details
-
Copy full SHA for 1a42bcc - Browse repository at this point
Copy the full SHA 1a42bccView commit details -
Construct main training loop for ML model
Finally got to the actual neural network model training! Now properly splitting the mini-batch data into input and target tensors, passing the input into the model to get the prediction, and minimizing the loss between prediction and target. Needed to do some ugly dtype casting to prevent `RuntimeError`s. Trying to keep this fairly basic without train/validation splits, and only ran this for 3 epochs. Have shifted some markdown blocks up where they belong too.
Configuration menu - View commit details
-
Copy full SHA for d599a37 - Browse repository at this point
Copy the full SHA d599a37View commit details
Commits on Aug 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for bdccda1 - Browse repository at this point
Copy the full SHA bdccda1View commit details
Commits on Aug 16, 2024
-
Add instructions to install pytorch in first code cell
Default CryoCloud docker image won't have Pytorch, so will need to install it at the first step.
Configuration menu - View commit details
-
Copy full SHA for 395982e - Browse repository at this point
Copy the full SHA 395982eView commit details -
Save geoparquet schema version 1.1.0 and reword note on zstd compression
Default CryoCloud image now has Geopandas 1.x, so can save to a non-beta version of GeoParquet schema now.
Configuration menu - View commit details
-
Copy full SHA for 2a0ae41 - Browse repository at this point
Copy the full SHA 2a0ae41View commit details -
Configuration menu - View commit details
-
Copy full SHA for c3cace6 - Browse repository at this point
Copy the full SHA c3cace6View commit details
Commits on Aug 18, 2024
-
Add overview flowchart to top of notebook and minor edits
Adding an overview diagram of the ATL07 + Sentinel-2 processing pipeline (illustrated using Excalidraw) to the start of the notebook. Made some minor edits to some of the markdown cells to include more references and explanatory text.
Configuration menu - View commit details
-
Copy full SHA for db55b60 - Browse repository at this point
Copy the full SHA db55b60View commit details -
Pre-render Jupyter notebook and move files to machine-learning folder
Pushing the photon_classifier Jupyter Notebook with pre-rendered cells that was ran on CryoCloud. Putting the files under a 'machine-learning' folder, to be consistent with the other tutorials using subfolders.
Configuration menu - View commit details
-
Copy full SHA for aac4747 - Browse repository at this point
Copy the full SHA aac4747View commit details