Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICESat-2 ML tutorial - photon classification on ATL07 sea ice data #17

Merged
merged 21 commits into from
Aug 18, 2024

Commits on Aug 6, 2024

  1. Initial jupytext notebook with outline and ATL07 to geopandas script

    First draft with a rough layout of sections for the ICESat-2 ML photon classification tutorial. Included learning objectives, and some initial code to read ATL07 sea ice data from HDF5 to a geopandas.GeoDataFrame. Deciding to do a reimplementation of the Koo et al., 2023 paper with code at https://github.com/YoungHyunKoo/IS2_ML.
    weiji14 committed Aug 6, 2024
    Configuration menu
    Copy the full SHA
    c8b1690 View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Save ATL07 photon data to GeoParquet file with ZSTD compression

    Show how to save geopandas.GeoDataFrame to a GeoParquet file, and load it back again. Also put down some notes about compression codecs.
    weiji14 committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    0c892ed View commit details
    Browse the repository at this point in the history
  2. Writeup sub-section on moving data from CPU to GPU

    Some quick code to convert the geopandas.GeoDataFrame to a torch.Tensor and put it in a torch DataLoader. Showing how to move data from CPU to GPU using the `.to` method. Might modify this section's title/subtitle later depending on how the code goes.
    weiji14 committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    18336b9 View commit details
    Browse the repository at this point in the history
  3. Add ICESat-2 photon classification tutorial to index table

    One more entry in the tutorial index page. Putting down Machine Learning and Pytorch as the topics, and ATL07 as the dataset used for now.
    weiji14 committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    5db2d77 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    561529c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6836bcd View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2024

  1. Refactor to get ATL07 using earthaccess instead of icepyx

    Less boilerplate s3fs code to manage, and not using icepyx means this should run on the Pangeo pytorch-notebook docker image too!
    weiji14 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    d616d77 View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2024

  1. Search for Sentinel-2 imagery captured at same time as ATL07 track

    Looking for a coincident alignment of two satellites (ICESat-2 and Sentinel-2) capturing data at the same time! Managed to find a coincident capture on 2019-02-24, though haven't checked if the spatial extent matches yet. Can improve the search algorithm later by expanding the search time window (+/- X minutes) and using a more exact bounding box search in the STAC API query.
    weiji14 committed Aug 10, 2024
    Configuration menu
    Copy the full SHA
    1858273 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. Second exact spatial intersection search using ATL07 line track

    Temporal match wasn't enough, so adding the spatial match as well. Metadata on ICESat-2 was lacking unfortunately, so need to open the ATL07 HDF5 file to get the xy coordinates and build a linestring from it to pass to the STAC query. Managed to find a lucky coincident match on 2019-10-31, and have verified that the crossover is valid.
    weiji14 committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    f445e25 View commit details
    Browse the repository at this point in the history
  2. Add more columns to GeoDataFrame and filter out cloudy points

    Add the `x_atc`, `layer_flag` and `height_segment_ssh_flag` data variables to the GeoDataFrame which will be useful for plotting/filtering later. Using `height_segment_ssh_flag` to remove points that might be affected by clouds.
    weiji14 committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    fce139d View commit details
    Browse the repository at this point in the history
  3. Plot ATL07 tracks on top of Sentinel-2 image

    Get the Sentinel-2 RGB image, reproject the ATL07 points and subset to the image's bounding box, then plot them both using PyGMT! The plot colors sea ice points as blue, and sea surface (water) points as orange.
    weiji14 committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    770bb83 View commit details
    Browse the repository at this point in the history
  4. Label surface type of ATL07 points using Sentinel-2 Red band pixel value

    Use PyGMT's grdtrack to get the Sentinel-2 Red band's pixel values sampled at every ATL07 xy point, and then apply a simple threshold to classify into water (dark), thin ice (gray) and thick ice (white).
    weiji14 committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    1a11529 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2024

  1. Rename Part 2 to DataLoader and Model architecture

    Reorganizing some content so Part 2 is focused on preparing the DataLoader and neural network model architecture. Have now moved the dataloader for-loop to Part 3 'Training' and commented out the to CUDA parts. Also calculated "hist_mean_median_h_diff" column which is the actual variable we want to use in training.
    weiji14 committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    87cf493 View commit details
    Browse the repository at this point in the history
  2. Architect PhotonClassificationModel and writeup ML model choices

    Writing up section about choosing a machine learning algorithm, including ML models with different levels of complexity from decision trees to neural networks and state-of-the-art models. Also implemented a simple multi-layer perceptron model based on the description in Koo et al., 2023's paper (but without the tanh activation).
    weiji14 committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    1a42bcc View commit details
    Browse the repository at this point in the history
  3. Construct main training loop for ML model

    Finally got to the actual neural network model training! Now properly splitting the mini-batch data into input and target tensors, passing the input into the model to get the prediction, and minimizing the loss between prediction and target. Needed to do some ugly dtype casting to prevent `RuntimeError`s. Trying to keep this fairly basic without train/validation splits, and only ran this for 3 epochs. Have shifted some markdown blocks up where they belong too.
    weiji14 committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    d599a37 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. Configuration menu
    Copy the full SHA
    bdccda1 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2024

  1. Add instructions to install pytorch in first code cell

    Default CryoCloud docker image won't have Pytorch, so will need to install it at the first step.
    weiji14 committed Aug 16, 2024
    Configuration menu
    Copy the full SHA
    395982e View commit details
    Browse the repository at this point in the history
  2. Save geoparquet schema version 1.1.0 and reword note on zstd compression

    Default CryoCloud image now has Geopandas 1.x, so can save to a non-beta version of GeoParquet schema now.
    weiji14 committed Aug 16, 2024
    Configuration menu
    Copy the full SHA
    2a0ae41 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c3cace6 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2024

  1. Add overview flowchart to top of notebook and minor edits

    Adding an overview diagram of the ATL07 + Sentinel-2 processing pipeline (illustrated using Excalidraw) to the start of the notebook. Made some minor edits to some of the markdown cells to include more references and explanatory text.
    weiji14 committed Aug 18, 2024
    Configuration menu
    Copy the full SHA
    db55b60 View commit details
    Browse the repository at this point in the history
  2. Pre-render Jupyter notebook and move files to machine-learning folder

    Pushing the photon_classifier Jupyter Notebook with pre-rendered cells that was ran on CryoCloud. Putting the files under a 'machine-learning' folder, to be consistent with the other tutorials using subfolders.
    weiji14 committed Aug 18, 2024
    Configuration menu
    Copy the full SHA
    aac4747 View commit details
    Browse the repository at this point in the history