Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of ICESat-2 crossover track height errors #146

Merged
merged 25 commits into from
Sep 16, 2020
Merged

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Aug 13, 2020

One person's noise is another person's signal! Increase temporal resolution of our ice surface height change analysis by looking at ICESat-2 track crossing areas. Would be good to get something like Figure 4 of Fricker et al. 2007. Will be using good ol GMT's x2sys_cross to do the heavy lifting, wrapped in PyGMT of course at GenericMappingTools/pygmt#546 and GenericMappingTools/pygmt#591.

The plots below are for an unnamed active subglacial lake at the Whillans Ice Stream upstream area that filled up sometime in December/January 2019/2020. Ice surface height went up by about 6 metres!

crossover_135_1272_2018-10-14_2020-05-13 crossover_area

There's also a spectacular example or a drain/fill pair happening over at Slessor Glacier. See how Slessor 23 has a max change of ~14m observed over just 4 months!

Draining (Upstream) Filling (Downstream)
crossover_many_normalized_slessor_45_2018-10-14_2020-05-13 crossover_many_normalized_slessor_23_2018-10-14_2020-05-13

See also:

TODO:

References:

@weiji14 weiji14 added the feature 🚀 Brand new feature label Aug 13, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@weiji14 weiji14 changed the title Analysis of crossover track height errors Analysis of ICESat-2 crossover track height errors Aug 13, 2020
@ghost
Copy link

ghost commented Aug 14, 2020

Congratulations 🎉. DeepCode analyzed your code in 2.131 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Calculate Antarctic ice surface height range (hrange) and rate of ice surface elevation change over time (dhdt) up to 20200513! Supersedes #55. This time, for the whole continent (n=~223 million data points)!! It's nice to see a full map with no blanks, and we'll need full coverage to do crossover error calculations properly later on. Side effect is that it took a couple of hours to run through the ETL data pipeline (like watching paint dry), and I was tempted to get it to work on the GPU but hit into all sorts of I/O slowness and GPU memory limitations once more.

Also spruced up the code here and there. Renamed the Whillans1 and Whillans2 regions to Whillans Downstream and Upstream respectively (and changed the bounding box too). Decided to drop the unneeded min/max_cycle dimensions from the xarray.Dataset (less things to handle). Interactive dhdt_range slider now uses a more fine-grained increment of 0.25m instead of 0.5m. Updated teaser image accordingly on main README.md too.
Performing crossover error analysis on an active subglacial lake at the Whillans Ice Stream downstream area. Specifically Subglacial Lake Conway, as documented by Fricker et al., 2007 at https://doi.org/10.1126/science.1136897. Took a while to get my head around `x2sys_cross` again, and look up somewhat advanced `pandas` table and `pygmt` plotting techniques. This code has gone through several local refactorings, and will likely need more (possibly after some PyGMT enhancements are released).
Adding a new module to deepicedrain for Extract, Transform and Load (ETL) workflows! Putting slices of a 2D array into several columns inside a dataframe is now easier with the array_to_dataframe function. Inspired by dask/dask#5021. The function is generalized so that dask arrays convert to a dask DataFrame, and numpy arrays convert to a pandas DataFrame.
~6.02 metres of elevation change at an unnamed subglacial lake on the Whillans Ice Stream upstream area. Best seen at the intersection of ICESat-2 ground tracks 135 and 1272. Funnily enough, I noticed this subglacial lake a few months ago because my friend gave me some bounding box coordinates for his region of interest and I wrongly interpreted the xmin/xmax/ymin/ymax!

Anyways, there's a lot of refactorings that went into the data I/O, processing and visualization. Decided to store the dhdt data into a columnar Parquet file format that can be loaded directly into pandas (and also cudf, more on this next time ;)). For speed, the x2sys_cross function is now wrapped in dask and parallelized for every combination of pair-tracks (non-intersections silently handled). The crossover point PyGMT plots now have nicer axis labels, especially for the time x-axis. Been playing with some unsupervised clustering today to automatically extract active subglacial lake outlines, coming soon!
Make bounding box subsetting work on DataFrames too! This includes pandas, dask and cudf DataFrames. Included a parametrized test for pandas and dask, the cudf one should work too since the APIs are similar. The original xarray.DataArray subsetter code will still work.
CUDA-accelerated GIS and spatiotemporal algorithms! A repeat of 58fdcdf, but with a newer version at v0.15.0! Also patch 852a643 by bumping up cuml version and switching the scikit-learn order.
A very fast way to find points inside polygons! This is really just a convenience function that wraps around `cuspatial.point_in_polygon`, hiding all sorts of boilerplate. Specifically, this handles:

1. Converting a geopandas geodataframe into a cuspatial friendly format, see rapidsai/cuspatial#165
2. Hacky workaround the 31 polygon limit using a for-loop, based on https://github.com/rapidsai/cuspatial/blob/branch-0.15/notebooks/nyc_taxi_years_correlation.ipynb
3. Outputting actual string labels from the geodataframe, instead of non human readable index numbers

Also added tests for this in test_spatiotemporal_gpu.py, though it won't work on the CI, only locally where a GPU is available.
Building on top of eb61ff6, but for n-dimensional arrays, and writing the dataframe to Parquet too! This function might be a little too convenient (read: contains hardcoding), but it smooths out some of the rough edges in terms of PyData file format interoperability. Should contribute this somewhere upstream when I get the time.
Improve our HvPlot/Panel dashboard with some new bells and whistles! Like a proper GIS desktop tool, the xy_dhdt dashboard plot can now keep the zoom level when changing between variables (thanks to https://discourse.holoviz.org/t/keep-zoom-level-when-changing-between-variables-in-a-scatter-plot)! Supersedes e4874b0. This is a major refresh of my old IceSatExplorer code at https://github.com/weiji14/cryospheric-data-lakes/blob/master/code/scripts/h5_to_np_icesat.ipynb, which uses ICESat-1 instead of ICESat-2. The dashboard also takes a lot of cues from the example at https://examples.pyviz.org/datashader_dashboard/dashboard.html, implemented in holoviz/datashader#676.

Other significant improvements include a categorical colourmap for the 'referencegroundtrack' variable, and being able to see the height and time of an ICESat-2 measurement at a particular cycle on hover over the points! Oh, and did I mention that the rendering now happens on the GPU?!! Data transformed to and from Parquet is fast! Note that this is a work in progress, and that there are more sweeping improvements to come. I've also split out the crossover analysis code into a separate atlxi_lake.ipynb file since atlxi_dhdt.ipynb was getting too long.
Parquet plugin for intake! Also edit Github Actions workflow to test on Pull Requests targeting any branch.
Improving the dashboard while making the code more maintainable by moving the pure hvplot scatterplot stuff into the intake atlas_catalog.yaml file, and placing the dashboard/widgets under vizplots.py. This is yet another attempt at tidying up the code in the jupyter notebook, moving them into the deepicedrain package instead! Also updated the alongtrack plot code to work with the new df_dhdt columnar data structure.

Will need to put the df_dhdt_{placename}.parquet data somewhere in the cloud (when I have time) so that the dashboard app can be used by more people, and also to enable unit testing of the visualization generators (always a tricky thing to test)! The dashboard is also currently hardcoded to plot the "whillans_upstream" area, will need to see the placename can be used as an argument into the IceSat2Explorer class.
Fix Continuous Integration tests failing due to the IceSat2Explorer class not being able to load df_dhdt_whillans_upstream.parquet. Really need to put the file up somewhere, but until I find a good data repository (ideally with versioning), this hacky workaround will be a necessary evil.
Pinning the RAPIDS AI libraries from the alpha/development versions to the stable release version. Also generating a environment-linux-64.lock for full reproducibility!

Bumps [cuml](https://github.com/rapidsai/cuml) from 0.15.0a200819 to 0.15.0.
- [Release notes](https://github.com/rapidsai/cuml/releases)
- [Changelog](https://github.com/rapidsai/cuml/blob/branch-0.15/CHANGELOG.md)
- [Commits](rapidsai/cuml@v0.15.0a...v0.15.0)

Bumps [cuspatial](https://github.com/rapidsai/cuspatial) from 0.15.0a200819 to 0.15.0
- [Release notes](https://github.com/rapidsai/cuspatial/releases)
- [Changelog](https://github.com/rapidsai/cuspatial/blob/branch-0.15/CHANGELOG.md)
- [Commits](rapidsai/cuspatial@v0.15.0a...v0.15.0)
Detect active subglacial lakes in Antarctica using Density-based spatial clustering of applications with noise (DBSCAN)! The subglacial lake detector works by finding clusters of high (filling at > 1m/yr) or low (draining at < -1 m/yr) height change over time (dhdt) values, for each drainage basin (that is grounded) in Antarctica. CUDA GPUs are awesome, the point in polygon takes 15 seconds, and the lake clustering takes 12 seconds, and this is working on >13 million points! Each cluster of points is then converted to a convex hull polygon, and we store some basic attribute information with the geometry such as the basin name, maximum absolute dhdt value, and reference ground tracks. The lakes are output to a geojson file using EPSG:3031 projection.

This is a long overdue commit as the code has been working since mid-August, but I kept wanting to refactor it (still need to!). The DBSCAN clustering parameters (eps=2500 and min_samples=250) work ok for the Siple Coast and Slessor Glacier, but fails for Pine Island Glacier since there's a lot of downwasting. Algorithm definitely needs more work. The visualizations and crossover analysis code also need to be refreshed (since the schema has changed), but it's sitting locally on my computer, waiting to be tidied up a bit more.
Combining draining/filling active lake cluster labels, which allows us to reduce the number of for-loop nesting in the active subglacial lake finder code, and plot both draining/filling lakes in the same figure! Cluster labels are now negative integers for draining lakes, positive integers for filling lakes, and NaN for noise points. Lake cluster plot now uses red (draining) and blue (filling) 'polar' colormap, with unclassified noise points in black as before. Code still takes 11 seconds to run for the entire Antarctic continent which is awesome! Also made a minor change to deepicedrain/__init__.py script to disable loading IceSat2Explorer dashboard script otherwise `import deepicedrain` will load stuff into GPU memory!
#149)

Closes #149 Find active subglacial lake points using unsupervised clustering.
Bumps [numcodecs](https://github.com/zarr-developers/numcodecs) from 0.6.4 to 0.7.1. Back to using PyPI instead of the conda-forge package now that wheels are available, thus reverting 154ad8b. Also bumped up gxx_linux build version to resolve conda package conflicts.

- [Release notes](https://github.com/zarr-developers/numcodecs/releases)
- [Changelog](https://github.com/zarr-developers/numcodecs/blob/master/docs/release.rst)
- [Commits](zarr-developers/numcodecs@v0.6.4...v0.7.1)
Bumps [pygmt](https://github.com/GenericMappingTools/pygmt) from 0.1.2-36-g4939ee2a to 0.2.0.
  - [Release notes](https://github.com/GenericMappingTools/pygmt/releases)
  - [Changelog](https://github.com/GenericMappingTools/pygmt/blob/master/doc/changes.rst)
  - [Commits](GenericMappingTools/pygmt@v0.1.2-36-g4939ee2a...v0.2.0)

This includes several enhancements such as 'Sensible array outputs for pygmt info' (GenericMappingTools/pygmt#575) and 'Allow passing in pandas dataframes to x2sys_cross' (GenericMappingTools/pygmt#591) that will make our crossover analysis work and figure generation easier! Also edited Github Actions workflow to only run Docker build on Pull Requests when ready to review or when review is requested (i.e. not when PR is in draft mode).
Keep our study regions box polygons in a proper GeoJSON file, and load them into a deepicedrain.Region class using from_gdf! This is heavily inspired by the regionmask.from_geopandas method. The from_gdf method also works on non-box shaped polygons (used in atlxi_lake.ipynb), and this makes use of `pygmt.info(..., spacing=1000)` to get a nice round numbered bounding box.

Really should vendor off this deepicedrain.Region class at some point, and I think https://github.com/mathause/regionmask is the most similar, though it doesn't quite support non-EPSG:4326 projections yet. Will probably have to turn the scale/subset/datashade convenience methods into standalone functions too, but that's ok :)
Wrapper to reduce boilerplate around using pandas.wide_to_long, namely setting an index, resetting the second level 'j' index after wide_to_long, and dropping rows with NaN. Useful for pre-processing and post-processing table data before and after crossover analysis with x2sys_cross. Also not saving track data to TSV files anymore now that x2sys_cross supports pandas.DataFrame inputs.
Try to get around sourcery-ai's complaint about the point_in_polygon_gpu being too complex by removing an if-statement.
Remove the hardcoding on visualizing whillans_upstream region only! Just needed a proper __init__ and super()__init__ inside the class :D. This change also allows us to make IceSat2Explorer a first class import citizen again, as the data won't be loaded on `import deepicedrain`, and tests won't break because of a FileNotFoundError for 'df_dhdt_whillans_upstream.parquet'.
Copy link

@stickler-ci stickler-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following files do not match the black styleguide:

  • atlxi_lake.py

The icy water may be draining, but we're still keeping things DRY! Chucking all of the alongtrack code (originally from atlxi_dhdt.ipynb, sitting in atlxi_lake.ipynb secretly for a while, now refactored to near perfection) into a proper visualization function that is tested! The abstraction makes it easier to generate alongtrack plots when looping through different reference ground tracks (0001 to 1387) and pair tracks (pt1, pt2, pt3). Included a boolean oldtonew flag to allow for flipping the legend, useful for e.g. when lake is filling up over time and we want Cycle 7 to be above Cycle 6.

To be honest, test_vizplots.py could almost be treated as an integration/behavioural driven development test, but haven't got time to do that properly. Also took the opportunity to update some documentation in deepicedrain/README.md, including the icesat2dhdt catalog entry and wide_to_long function that was missed out before.
Finally, showing (nicely) how sudden active subglacial lakes can fill up (~14m over 4 months at Slessor 23)! Made a plot_crossovers function that can plot both raw and normalized height changes over time, for every crossover location from intersecting ICESat-2 tracks. Refactored a lot of the code to make use of PyGMT v0.2.0 capabilities (e.g. nicer `pygmt.info` and `pygmt.x2sys_cross`), and also some pandas groupby magic. Also improved the quality of the vizplots.py docstring by including a fancy ASCII text representation of the expected input table data and output figure!
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 16, 2020

Sourcery Code Quality Report (beta)

❌  Merging this PR will decrease code quality in the affected files by 0.03 out of 10.

Quality metrics Before After Change
Complexity 0.57 0.80 0.23 🔴
Method Length 108.98 104.94 -4.04 🔵
Quality 8.47 8.44 -0.03 🔴
Other metrics Before After Change
Lines 1233 1327 94
Changed files Quality Before Quality After Quality Change
atl11_play.py 4.91 4.91 0.00
atlxi_dhdt.py 5.96 4.50 -1.46 🔴
deepicedrain/init.py 8.75 8.54 -0.21 🔴
deepicedrain/spatiotemporal.py 8.82 8.30 -0.52 🔴
deepicedrain/tests/test_deepicedrain.py 9.69 9.68 -0.01 🔴
deepicedrain/tests/test_region.py 9.04 8.88 -0.16 🔴
deepicedrain/tests/test_spatiotemporal_conversions.py 8.58 8.58 0.00

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Overall Recommendation
deepicedrain/spatiotemporal.py point_in_polygon_gpu 4 170.90 6.13 Split out functionality

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

@weiji14 weiji14 marked this pull request as ready for review September 16, 2020 04:36
@weiji14 weiji14 closed this in cc3989d Sep 16, 2020
@weiji14 weiji14 merged commit cc3989d into master Sep 16, 2020
@weiji14 weiji14 deleted the crossover_tracks branch September 16, 2020 04:45
@weiji14 weiji14 added this to the v0.3.0 milestone Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🚀 Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant