-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find active subglacial lake points using unsupervised clustering #149
Conversation
Adding a new module to deepicedrain for Extract, Transform and Load (ETL) workflows! Putting slices of a 2D array into several columns inside a dataframe is now easier with the array_to_dataframe function. Inspired by dask/dask#5021. The function is generalized so that dask arrays convert to a dask DataFrame, and numpy arrays convert to a pandas DataFrame.
Make bounding box subsetting work on DataFrames too! This includes pandas, dask and cudf DataFrames. Included a parametrized test for pandas and dask, the cudf one should work too since the APIs are similar. The original xarray.DataArray subsetter code will still work.
A very fast way to find points inside polygons! This is really just a convenience function that wraps around `cuspatial.point_in_polygon`, hiding all sorts of boilerplate. Specifically, this handles: 1. Converting a geopandas geodataframe into a cuspatial friendly format, see rapidsai/cuspatial#165 2. Hacky workaround the 31 polygon limit using a for-loop, based on https://github.com/rapidsai/cuspatial/blob/branch-0.15/notebooks/nyc_taxi_years_correlation.ipynb 3. Outputting actual string labels from the geodataframe, instead of non human readable index numbers Also added tests for this in test_spatiotemporal_gpu.py, though it won't work on the CI, only locally where a GPU is available.
Building on top of eb61ff6, but for n-dimensional arrays, and writing the dataframe to Parquet too! This function might be a little too convenient (read: contains hardcoding), but it smooths out some of the rough edges in terms of PyData file format interoperability. Should contribute this somewhere upstream when I get the time.
deeebf4
to
b65b8ba
Compare
Improve our HvPlot/Panel dashboard with some new bells and whistles! Like a proper GIS desktop tool, the xy_dhdt dashboard plot can now keep the zoom level when changing between variables (thanks to https://discourse.holoviz.org/t/keep-zoom-level-when-changing-between-variables-in-a-scatter-plot)! Supersedes e4874b0. This is a major refresh of my old IceSatExplorer code at https://github.com/weiji14/cryospheric-data-lakes/blob/master/code/scripts/h5_to_np_icesat.ipynb, which uses ICESat-1 instead of ICESat-2. The dashboard also takes a lot of cues from the example at https://examples.pyviz.org/datashader_dashboard/dashboard.html, implemented in holoviz/datashader#676. Other significant improvements include a categorical colourmap for the 'referencegroundtrack' variable, and being able to see the height and time of an ICESat-2 measurement at a particular cycle on hover over the points! Oh, and did I mention that the rendering now happens on the GPU?!! Data transformed to and from Parquet is fast! Note that this is a work in progress, and that there are more sweeping improvements to come. I've also split out the crossover analysis code into a separate atlxi_lake.ipynb file since atlxi_dhdt.ipynb was getting too long.
Parquet plugin for intake! Also edit Github Actions workflow to test on Pull Requests targeting any branch.
b65b8ba
to
938ab84
Compare
Improving the dashboard while making the code more maintainable by moving the pure hvplot scatterplot stuff into the intake atlas_catalog.yaml file, and placing the dashboard/widgets under vizplots.py. This is yet another attempt at tidying up the code in the jupyter notebook, moving them into the deepicedrain package instead! Also updated the alongtrack plot code to work with the new df_dhdt columnar data structure. Will need to put the df_dhdt_{placename}.parquet data somewhere in the cloud (when I have time) so that the dashboard app can be used by more people, and also to enable unit testing of the visualization generators (always a tricky thing to test)! The dashboard is also currently hardcoded to plot the "whillans_upstream" area, will need to see the placename can be used as an argument into the IceSat2Explorer class.
Congratulations 🎉. DeepCode analyzed your code in 2.454 seconds and we found no issues. Enjoy a moment of no bugs ☀️. 👉 View analysis in DeepCode’s Dashboard | Configure the bot |
e289cbb
to
3869f07
Compare
3869f07
to
5a8313a
Compare
Fix Continuous Integration tests failing due to the IceSat2Explorer class not being able to load df_dhdt_whillans_upstream.parquet. Really need to put the file up somewhere, but until I find a good data repository (ideally with versioning), this hacky workaround will be a necessary evil.
Pinning the RAPIDS AI libraries from the alpha/development versions to the stable release version. Also generating a environment-linux-64.lock for full reproducibility! Bumps [cuml](https://github.com/rapidsai/cuml) from 0.15.0a200819 to 0.15.0. - [Release notes](https://github.com/rapidsai/cuml/releases) - [Changelog](https://github.com/rapidsai/cuml/blob/branch-0.15/CHANGELOG.md) - [Commits](rapidsai/cuml@v0.15.0a...v0.15.0) Bumps [cuspatial](https://github.com/rapidsai/cuspatial) from 0.15.0a200819 to 0.15.0 - [Release notes](https://github.com/rapidsai/cuspatial/releases) - [Changelog](https://github.com/rapidsai/cuspatial/blob/branch-0.15/CHANGELOG.md) - [Commits](rapidsai/cuspatial@v0.15.0a...v0.15.0)
Detect active subglacial lakes in Antarctica using Density-based spatial clustering of applications with noise (DBSCAN)! The subglacial lake detector works by finding clusters of high (filling at > 1m/yr) or low (draining at < -1 m/yr) height change over time (dhdt) values, for each drainage basin (that is grounded) in Antarctica. CUDA GPUs are awesome, the point in polygon takes 15 seconds, and the lake clustering takes 12 seconds, and this is working on >13 million points! Each cluster of points is then converted to a convex hull polygon, and we store some basic attribute information with the geometry such as the basin name, maximum absolute dhdt value, and reference ground tracks. The lakes are output to a geojson file using EPSG:3031 projection. This is a long overdue commit as the code has been working since mid-August, but I kept wanting to refactor it (still need to!). The DBSCAN clustering parameters (eps=2500 and min_samples=250) work ok for the Siple Coast and Slessor Glacier, but fails for Pine Island Glacier since there's a lot of downwasting. Algorithm definitely needs more work. The visualizations and crossover analysis code also need to be refreshed (since the schema has changed), but it's sitting locally on my computer, waiting to be tidied up a bit more.
Combining draining/filling active lake cluster labels, which allows us to reduce the number of for-loop nesting in the active subglacial lake finder code, and plot both draining/filling lakes in the same figure! Cluster labels are now negative integers for draining lakes, positive integers for filling lakes, and NaN for noise points. Lake cluster plot now uses red (draining) and blue (filling) 'polar' colormap, with unclassified noise points in black as before. Code still takes 11 seconds to run for the entire Antarctic continent which is awesome! Also made a minor change to deepicedrain/__init__.py script to disable loading IceSat2Explorer dashboard script otherwise `import deepicedrain` will load stuff into GPU memory!
35ca4a2
to
8982919
Compare
Sourcery Code Quality Report (beta)✅ Merging this PR will increase code quality in the affected files by 0.02 out of 10.
Here are some functions in these files that still need a tune-up:
Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Let us know what you think of it by mentioning @sourcery-ai in a comment. |
Pick out active subglacial lakes in Antarctica from (pre-processed) ICESat-2 point clouds automatically using unsupervised clustering techniques. Utilize RAPIDS AI GPU accelerated libraries to do so fast!
TODO:
References: