Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial play with the ATL06 ICESat-2 product #21

Merged
merged 8 commits into from
May 20, 2020
Merged

Initial play with the ATL06 ICESat-2 product #21

merged 8 commits into from
May 20, 2020

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented May 6, 2020

The moment you've all been waiting for, modern Exploratory Data Analysis on ICESat-2 ATL06 data with the PyData stack, using intake catalogs to retrieve data and hvplot for plotting! In other words:

Yet another standard

The code here should scale better to continent-wide analysis (i.e. big data), compared to the official scripts which will work better for a smaller region (e.g. a glacier or ice stream). Happy to consider merging this stuff into icepyx once things get a bit more stable.

The atl06_play jupyter notebook does the following:

  • Use Intake to download and manage the ATL06 data catalogued in catalog.yaml.
  • Read all 6 laser beams from the HDF5 files concurrently (read: no for-loops) via xarray/intake into a Dask/Xarray Dataset format, and then tidied into a Dask/Pandas DataFrame
  • Plot the point cloud using HvPlot, which produces an interactive figure we can pan around.

ATL06 cross section plot

There's also some old SciPy scripts to produce a DEM out of the points, an old attempt to use XrViz to visualize the multi-dimensional data, and some example code that uses the OpenAltimery API.

TODO:

  • Update from ATL06 v2 to ATL06 v3, and find a way to use intake to download Orbital Segments 10, 11 & 12, not just 11 (3f8c465)
  • Pair jupyter notebook with python script using jupytext and lint code with black (2ffdf36)
  • etc

The moment you've all been waiting for, modern Exploratory Data Analysis on ICESat-2 ATL06 data with the PyData stack, using intake catalogs to retrieve data and hvplot for plotting! Sure, it's another standard (see https://xkcd.com/927) but that's part of science (I guess). Note that this notebook was developed several months ago, but for various reasons, the commit has only happened now in a post-covid era.

The jupyter notebook starts by running through the use of intake to download and manage the ATL06 data catalogued in catalog.yaml. All 6 laser beams are read from the HDF5 files concurrently (read: no for-loops) via xarray/intake into a Dask/Xarray Dataset format, and then tidied into a Dask/Pandas DataFrame. Finally, we plot them points using HvPlot, which produces an interactive figure we can pan around.

Also left in some old SciPy scripts to produce a DEM out of the points, an old attempt to use XrViz to visualize the multi-dimensional data, and some example code that uses the OpenAltimery API.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

weiji14 added 5 commits May 7, 2020 21:15
Bring in intake-xarray enabled with fsspec file caching capabilities! Also a newer tqdm version, why not?
About a quarter of the way through downloading ~2TB? of ICESat2 ATL06 version 3 data all over Antarctica, but let's introduce icesat2atlasdownloader first shall we? This baby allows us to download any ICESat-2/ATLAS product, for any given date, hardcoded to Orbital Segments 10, 11, 12 (i.e. Antarctica), and oh yeah, it does so by 'caching' the remote data locally using intake/fsspec. Tie that up with a highly parallelized dask task scheduler, complete with tqdm progress bars, and I'll just need to sit back and wait until everything is downloaded next morning.

Again, this code was worked on pre-covid19, but there were issues with the intake cache mechanism back then. You won't know it, but changing from using intake-specific cache (that is deprecated, messy, and puts a unconfigurable 'hash' in the filepath, though with nice dask parallelization abilities) to fsspec-specific 'simplecache' (more configurable, no hash in filepath, though it requires writing own parallelization code) is a delight! It enables us to download a list of orbital segments (10, 11, 12) instead of just 11 before. Download is parallelized using dask futures, with progress tracked using tqdm (or in the dask dashboard). The main difference between icesat2atlasdownloader and icesat2atl06 is that the former doesn't read into the laser group but the latter does (and is prone to pandas IndexErrors from duplicated index dates).

With version 3 of ATL06, the max date has gone from 2019.11.15 to 2020.03.06, or about 1 cycle more. Changes documented at https://nsidc.org/data/atl06/versions/3. They seem to have removed some noisy points it seems, will need to do some Exploratory Data Analysis after downloads are done.
Not sure why that 2019.12.09 date is missing in ATL06 version 3, it was in version 2! Doing some error management, and ensure that we check all downloads are completed.
Get jupytext to pair notebooks, black to lint code, and also update to newer jupyterlab version!
Pair up the jupyter notebook with a .py script, and lint it with black. Nicer to look at and easier to diff!
@weiji14 weiji14 force-pushed the atl06_play branch 2 times, most recently from 050e479 to 2ffdf36 Compare May 19, 2020 21:07
weiji14 added 2 commits May 20, 2020 09:44
Add xrviz, and keep other dependencies up to date!
Tidy up lots of things left over from early experimentation. Now loading xarray.Dataset way faster by combining using "by_coords" instead of "nested". This change has been implemented in both the catalog.yaml file and six_laser_beams function (the latter which needs a refactor).

The catalog.yaml now features a way to load only 1 reference ground track (instead of multiple), if only you know the date too! Made it easier to understand what some of the variables/functions are by using type hints. The quickview plot now plots the coastline of Antarctica too!
@weiji14 weiji14 marked this pull request as ready for review May 20, 2020 03:19
@weiji14 weiji14 merged commit 23a6436 into master May 20, 2020
@weiji14 weiji14 deleted the atl06_play branch May 20, 2020 03:33
@weiji14 weiji14 added the enhancement ✨ New feature or request label May 20, 2020
@weiji14 weiji14 added this to the v0.1.0 milestone May 28, 2020
@weiji14 weiji14 added feature 🚀 Brand new feature and removed enhancement ✨ New feature or request labels May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🚀 Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant