-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial play with the ATL06 ICESat-2 product #21
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The moment you've all been waiting for, modern Exploratory Data Analysis on ICESat-2 ATL06 data with the PyData stack, using intake catalogs to retrieve data and hvplot for plotting! Sure, it's another standard (see https://xkcd.com/927) but that's part of science (I guess). Note that this notebook was developed several months ago, but for various reasons, the commit has only happened now in a post-covid era. The jupyter notebook starts by running through the use of intake to download and manage the ATL06 data catalogued in catalog.yaml. All 6 laser beams are read from the HDF5 files concurrently (read: no for-loops) via xarray/intake into a Dask/Xarray Dataset format, and then tidied into a Dask/Pandas DataFrame. Finally, we plot them points using HvPlot, which produces an interactive figure we can pan around. Also left in some old SciPy scripts to produce a DEM out of the points, an old attempt to use XrViz to visualize the multi-dimensional data, and some example code that uses the OpenAltimery API.
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
Bring in intake-xarray enabled with fsspec file caching capabilities! Also a newer tqdm version, why not?
About a quarter of the way through downloading ~2TB? of ICESat2 ATL06 version 3 data all over Antarctica, but let's introduce icesat2atlasdownloader first shall we? This baby allows us to download any ICESat-2/ATLAS product, for any given date, hardcoded to Orbital Segments 10, 11, 12 (i.e. Antarctica), and oh yeah, it does so by 'caching' the remote data locally using intake/fsspec. Tie that up with a highly parallelized dask task scheduler, complete with tqdm progress bars, and I'll just need to sit back and wait until everything is downloaded next morning. Again, this code was worked on pre-covid19, but there were issues with the intake cache mechanism back then. You won't know it, but changing from using intake-specific cache (that is deprecated, messy, and puts a unconfigurable 'hash' in the filepath, though with nice dask parallelization abilities) to fsspec-specific 'simplecache' (more configurable, no hash in filepath, though it requires writing own parallelization code) is a delight! It enables us to download a list of orbital segments (10, 11, 12) instead of just 11 before. Download is parallelized using dask futures, with progress tracked using tqdm (or in the dask dashboard). The main difference between icesat2atlasdownloader and icesat2atl06 is that the former doesn't read into the laser group but the latter does (and is prone to pandas IndexErrors from duplicated index dates). With version 3 of ATL06, the max date has gone from 2019.11.15 to 2020.03.06, or about 1 cycle more. Changes documented at https://nsidc.org/data/atl06/versions/3. They seem to have removed some noisy points it seems, will need to do some Exploratory Data Analysis after downloads are done.
Not sure why that 2019.12.09 date is missing in ATL06 version 3, it was in version 2! Doing some error management, and ensure that we check all downloads are completed.
Get jupytext to pair notebooks, black to lint code, and also update to newer jupyterlab version!
Pair up the jupyter notebook with a .py script, and lint it with black. Nicer to look at and easier to diff!
weiji14
force-pushed
the
atl06_play
branch
2 times, most recently
from
May 19, 2020 21:07
050e479
to
2ffdf36
Compare
Add xrviz, and keep other dependencies up to date!
Tidy up lots of things left over from early experimentation. Now loading xarray.Dataset way faster by combining using "by_coords" instead of "nested". This change has been implemented in both the catalog.yaml file and six_laser_beams function (the latter which needs a refactor). The catalog.yaml now features a way to load only 1 reference ground track (instead of multiple), if only you know the date too! Made it easier to understand what some of the variables/functions are by using type hints. The quickview plot now plots the coastline of Antarctica too!
weiji14
added
feature 🚀
Brand new feature
and removed
enhancement ✨
New feature or request
labels
May 28, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The moment you've all been waiting for, modern Exploratory Data Analysis on ICESat-2 ATL06 data with the PyData stack, using intake catalogs to retrieve data and hvplot for plotting! In other words:
The code here should scale better to continent-wide analysis (i.e. big data), compared to the official scripts which will work better for a smaller region (e.g. a glacier or ice stream). Happy to consider merging this stuff into icepyx once things get a bit more stable.
The
atl06_play
jupyter notebook does the following:There's also some old SciPy scripts to produce a DEM out of the points, an old attempt to use XrViz to visualize the multi-dimensional data, and some example code that uses the OpenAltimery API.
TODO: