From ccd16474aada5cdd3576c0c8128585d05e28d6f2 Mon Sep 17 00:00:00 2001 From: Wei Ji Date: Wed, 6 May 2020 21:10:39 +1200 Subject: [PATCH 1/5] :construction: Initial play with the ATL06 ICESat-2 product The moment you've all been waiting for, modern Exploratory Data Analysis on ICESat-2 ATL06 data with the PyData stack, using intake catalogs to retrieve data and hvplot for plotting! Sure, it's another standard (see https://xkcd.com/927) but that's part of science (I guess). Note that this notebook was developed several months ago, but for various reasons, the commit has only happened now in a post-covid era. The jupyter notebook starts by running through the use of intake to download and manage the ATL06 data catalogued in catalog.yaml. All 6 laser beams are read from the HDF5 files concurrently (read: no for-loops) via xarray/intake into a Dask/Xarray Dataset format, and then tidied into a Dask/Pandas DataFrame. Finally, we plot them points using HvPlot, which produces an interactive figure we can pan around. Also left in some old SciPy scripts to produce a DEM out of the points, an old attempt to use XrViz to visualize the multi-dimensional data, and some example code that uses the OpenAltimery API. --- atl06_play.ipynb | 2299 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2299 insertions(+) create mode 100644 atl06_play.ipynb diff --git a/atl06_play.ipynb b/atl06_play.ipynb new file mode 100644 index 0000000..e590e70 --- /dev/null +++ b/atl06_play.ipynb @@ -0,0 +1,2299 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# **ATLAS/ICESat-2 Land Ice Height [ATL06](https://nsidc.org/data/atl06/) Exploratory Data Analysis**\n", + "\n", + "[Yet another](https://xkcd.com/927) take on playing with ICESat-2's Land Ice Height ATL06 data,\n", + "specfically with a focus on analyzing ice elevation changes over Antarctica.\n", + "Specifically, this jupyter notebook will cover:\n", + "\n", + "- Downloading datasets from the web via [intake](https://intake.readthedocs.io)\n", + "- Performing [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis)\n", + " using the [PyData](https://pydata.org) stack (e.g. [xarray](http://xarray.pydata.org), [dask](https://dask.org))\n", + "- Plotting figures using [Hvplot](https://hvplot.holoviz.org) and [PyGMT](https://www.pygmt.org) (TODO)\n", + "\n", + "This is in contrast with the [icepyx](https://github.com/icesat2py/icepyx) package\n", + "and 'official' 2019/2020 [ICESat-2 Hackweek tutorials](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials) (which are also awesome!)\n", + "that tends to use a slightly different approach (e.g. handcoded download scripts, [h5py](http://www.h5py.org) for data reading, etc).\n", + "The core concept here is to run things in a more intuitive and scalable (parallelizable) manner on a continent scale (rather than just a specific region)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import glob\n", + "import json\n", + "import logging\n", + "import netrc\n", + "import os\n", + "\n", + "import dask\n", + "import dask.distributed\n", + "import hvplot.dask\n", + "import hvplot.pandas\n", + "import hvplot.xarray\n", + "import intake\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import requests\n", + "import tqdm\n", + "import xarray as xr\n", + "\n", + "# %matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "

Client

\n", + "\n", + "
\n", + "

Cluster

\n", + "
    \n", + "
  • Workers: 7
  • \n", + "
  • Cores: 7
  • \n", + "
  • Memory: 201.22 GB
  • \n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Configure intake and set number of compute cores for data download\n", + "intake.config.conf[\"cache_dir\"] = \"catdir\" # saves data to current folder\n", + "intake.config.conf[\"download_progress\"] = False # disable automatic tqdm progress bars\n", + "\n", + "logging.basicConfig(level=logging.WARNING)\n", + "\n", + "# Limit compute to 8 cores for download part using intake\n", + "# Can possibly go up to 10 because there are 10 DPs?\n", + "# See https://n5eil02u.ecs.nsidc.org/opendap/hyrax/catalog.xml\n", + "client = dask.distributed.Client(n_workers=7, threads_per_worker=1)\n", + "client" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Quick view\n", + "\n", + "Use our [intake catalog](https://intake.readthedocs.io/en/latest/catalog.html) to get some sample ATL06 data\n", + "(while making sure we have our Earthdata credentials set up properly),\n", + "and view it using [xarray](https://xarray.pydata.org) and [hvplot](https://hvplot.pyviz.org)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "catalog = intake.open_catalog(uri=\"catalog.yaml\") # open the local catalog file containing ICESAT2 stuff" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "Show/Hide data repr\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Show/Hide attributes\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
xarray.Dataset
    • delta_time: 907915
    • referencegroundtrack: 10
    • orbitalsegment
      ()
      <U2
      '11'
      array('11', dtype='<U2')
    • revision
      ()
      <U2
      '01'
      array('01', dtype='<U2')
    • version
      ()
      <U3
      '002'
      array('002', dtype='<U3')
    • cyclenumber
      ()
      <U2
      '03'
      array('03', dtype='<U2')
    • longitude
      (delta_time)
      float64
      dask.array<chunksize=(50000,), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Longitude of segment center, , WGS84, East=+
      long_name :
      Longitude
      source :
      section 3.10
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 7.26 MB 622.86 kB
      Shape (907915,) (77857,)
      Count 701 Tasks 15 Chunks
      Type float64 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 1\n", + "\n", + "
    • latitude
      (delta_time)
      float64
      dask.array<chunksize=(50000,), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Latitude of segment center, WGS84, North=+,
      long_name :
      Latitude
      source :
      section 3.10
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 7.26 MB 622.86 kB
      Shape (907915,) (77857,)
      Count 701 Tasks 15 Chunks
      Type float64 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 1\n", + "\n", + "
    • delta_time
      (delta_time)
      datetime64[ns]
      2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984
      contentType :
      referenceInformation
      description :
      Number of GPS seconds since the ATLAS SDP epoch. The ATLAS Standard Data Products (SDP) epoch offset is defined within /ancillary_data/atlas_sdp_gps_epoch as the number of GPS seconds between the GPS epoch (1980-01-06T00:00:00.000000Z UTC) and the ATLAS SDP epoch. By adding the offset contained within atlas_sdp_gps_epoch to delta time parameters, the time in gps_seconds relative to the GPS epoch can be computed.
      long_name :
      Elapsed GPS seconds
      source :
      section 4.4
      standard_name :
      time
      array(['2019-06-26T00:35:41.688893288', '2019-06-26T00:35:41.936968728',\n",
      +       "       '2019-06-26T00:35:44.974930184', ..., '2019-06-26T14:49:54.854559880',\n",
      +       "       '2019-06-26T14:49:54.975701040', '2019-06-26T14:49:55.015304984'],\n",
      +       "      dtype='datetime64[ns]')
    • datetime
      (referencegroundtrack)
      datetime64[ns]
      2019-06-26T00:35:36 ... 2019-06-26T14:44:13
      array(['2019-06-26T00:35:36.000000000', '2019-06-26T02:09:54.000000000',\n",
      +       "       '2019-06-26T03:44:11.000000000', '2019-06-26T05:18:28.000000000',\n",
      +       "       '2019-06-26T06:52:45.000000000', '2019-06-26T08:27:03.000000000',\n",
      +       "       '2019-06-26T10:01:20.000000000', '2019-06-26T11:35:38.000000000',\n",
      +       "       '2019-06-26T13:09:55.000000000', '2019-06-26T14:44:13.000000000'],\n",
      +       "      dtype='datetime64[ns]')
    • referencegroundtrack
      (referencegroundtrack)
      object
      '1355' '1356' ... '1363' '1364'
      array(['1355', '1356', '1357', '1358', '1359', '1360', '1361', '1362', '1363',\n",
      +       "       '1364'], dtype=object)
    • atl06_quality_summary
      (referencegroundtrack, delta_time)
      float64
      dask.array<chunksize=(1, 50000), meta=np.ndarray>
      contentType :
      qualityInformation
      description :
      The ATL06_quality_summary parameter indicates the best-quality subset of all ATL06 data. A zero in this parameter implies that no data-quality tests have found a problem with the segment, a one implies that some potential problem has been found. Users who select only segments with zero values for this flag can be relatively certain of obtaining high-quality data, but will likely miss a significant fraction of usable data, particularly in cloudy, rough, or low-surface-reflectance conditions.
      flag_meanings :
      best_quality potential_problem
      flag_values :
      [0 1]
      long_name :
      ATL06_Quality_Summary
      source :
      section 4.3
      units :
      1
      valid_max :
      1
      valid_min :
      0
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 72.63 MB 622.86 kB
      Shape (10, 907915) (1, 77857)
      Count 656 Tasks 150 Chunks
      Type float64 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 10\n", + "\n", + "
    • h_li
      (referencegroundtrack, delta_time)
      float32
      dask.array<chunksize=(1, 50000), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Standard land-ice segment height determined by land ice algorithm, corrected for first-photon bias, representing the median- based height of the selected PEs
      long_name :
      Land Ice height
      source :
      section 4.4
      units :
      meters
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 36.32 MB 311.43 kB
      Shape (10, 907915) (1, 77857)
      Count 632 Tasks 150 Chunks
      Type float32 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 10\n", + "\n", + "
    • h_li_sigma
      (referencegroundtrack, delta_time)
      float32
      dask.array<chunksize=(1, 50000), meta=np.ndarray>
      contentType :
      qualityInformation
      description :
      Propagated error due to sampling error and FPB correction from the land ice algorithm
      long_name :
      Expected RMS segment misfit
      source :
      section 4.4
      units :
      meters
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 36.32 MB 311.43 kB
      Shape (10, 907915) (1, 77857)
      Count 632 Tasks 150 Chunks
      Type float32 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 10\n", + "\n", + "
    • segment_id
      (referencegroundtrack, delta_time)
      float64
      dask.array<chunksize=(1, 50000), meta=np.ndarray>
      contentType :
      referenceInformation
      description :
      Segment number, counting from the equator. Equal to the segment_id for the second of the two 20m ATL03 segments included in the 40m ATL06 segment
      long_name :
      Reference Point, m
      source :
      section 3.1.2.1
      units :
      1
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 72.63 MB 622.86 kB
      Shape (10, 907915) (1, 77857)
      Count 632 Tasks 150 Chunks
      Type float64 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 10\n", + "\n", + "
    • sigma_geo_h
      (referencegroundtrack, delta_time)
      float32
      dask.array<chunksize=(1, 50000), meta=np.ndarray>
      contentType :
      qualityInformation
      description :
      Total vertical geolocation error due to PPD and POD, including the effects of horizontal geolocation error on the segment vertical error.
      long_name :
      Vertical Geolocation Error
      source :
      section 3.10
      units :
      meters
      \n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "\n",
      +       "
      \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
      Array Chunk
      Bytes 36.32 MB 311.43 kB
      Shape (10, 907915) (1, 77857)
      Count 632 Tasks 150 Chunks
      Type float32 numpy.ndarray
      \n", + "
      \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + " 907915\n", + " 10\n", + "\n", + "
  • Description :
    The land_ice_height group contains the primary set of derived ATL06 products. This includes geolocation, height, and standard error and quality measures for each segment. This group is sparse, meaning that parameters are provided only for pairs of segments for which at least one beam has a valid surface-height measurement.
    data_rate :
    Data within this group are sparse. Data values are provided only for those ICESat-2 20m segments where at least one beam has a valid land ice height measurement.
" + ], + "text/plain": [ + "\n", + "Dimensions: (delta_time: 907915, referencegroundtrack: 10)\n", + "Coordinates:\n", + " orbitalsegment \n", + " latitude (delta_time) float64 dask.array\n", + " * delta_time (delta_time) datetime64[ns] 2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984\n", + " datetime (referencegroundtrack) datetime64[ns] 2019-06-26T00:35:36 ... 2019-06-26T14:44:13\n", + " * referencegroundtrack (referencegroundtrack) object '1355' ... '1364'\n", + "Data variables:\n", + " atl06_quality_summary (referencegroundtrack, delta_time) float64 dask.array\n", + " h_li (referencegroundtrack, delta_time) float32 dask.array\n", + " h_li_sigma (referencegroundtrack, delta_time) float32 dask.array\n", + " segment_id (referencegroundtrack, delta_time) float64 dask.array\n", + " sigma_geo_h (referencegroundtrack, delta_time) float32 dask.array\n", + "Attributes:\n", + " Description: The land_ice_height group contains the primary set of deriv...\n", + " data_rate: Data within this group are sparse. Data values are provide..." + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "try:\n", + " netrc.netrc()\n", + "except FileNotFoundError as error_msg:\n", + " print(f\"{error_msg}, please follow instructions to create one at \"\n", + " \"https://nsidc.org/support/faq/what-options-are-available-bulk-downloading-data-https-earthdata-login-enabled \"\n", + " 'basically using `echo \"machine urs.earthdata.nasa.gov login password \" >> ~/.netrc`')\n", + " raise\n", + "\n", + "dataset = catalog.icesat2atl06.to_dask().unify_chunks() # depends on .netrc file in home folder\n", + "dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#dataset.hvplot.points(\n", + "# x=\"longitude\", y=\"latitude\", datashade=True, width=800, height=500, hover=True,\n", + "# #geo=True, coastline=True, crs=cartopy.crs.PlateCarree(), #projection=cartopy.crs.Stereographic(central_latitude=-71),\n", + "#)\n", + "catalog.icesat2atl06.hvplot.quickview()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data intake\n", + "\n", + "Pulling in all of the raw ATL06 data (HDF5 format) from the NSIDC servers via an intake catalog file.\n", + "Note that this will involve 100s if not 1000s of GBs of data, so make sure there's enough storage!!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# Download all ICESAT2 ATLAS hdf files from start to end date\n", + "dates1 = pd.date_range(start=\"2018.10.14\", end=\"2019.06.26\") # 1st batch\n", + "dates2 = pd.date_range(start=\"2019.07.26\", end=\"2019.11.15\") # 2nd batch\n", + "dates = dates1.append(dates2)[:2]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 0%| | 0/2 [00:00 0:\n", + " da = dask.delayed(six_laser_beams)(\n", + " crossing_dates=crossing_dates_dict[referencegroundtrack]\n", + " )\n", + " # da = six_laser_beams(crossing_dates=crossing_dates_dict[referencegroundtrack])\n", + " dataset_dict[referencegroundtrack] = da" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "df = dataset_dict[\"0349\"].compute() # loads into a dask dataframe (lazy)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Dask DataFrame Structure:
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
delta_timelaserlatitudelongitudeatl06_quality_summaryh_lih_li_sigmasegment_idsigma_geo_h
npartitions=115
0datetime64[ns]objectfloat64float64float64float32float32float64float32
569904...........................
..............................
14522322...........................
15181415...........................
\n", + "
\n", + "
Dask Name: concat-indexed, 16082 tasks
" + ], + "text/plain": [ + "Dask DataFrame Structure:\n", + " delta_time laser latitude longitude atl06_quality_summary h_li h_li_sigma segment_id sigma_geo_h\n", + "npartitions=115 \n", + "0 datetime64[ns] object float64 float64 float64 float32 float32 float64 float32\n", + "569904 ... ... ... ... ... ... ... ... ...\n", + "... ... ... ... ... ... ... ... ... ...\n", + "14522322 ... ... ... ... ... ... ... ... ...\n", + "15181415 ... ... ... ... ... ... ... ... ...\n", + "Dask Name: concat-indexed, 16082 tasks" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataset_dict = dask.compute(dataset_dict)[0] # compute every referencegroundtrack, slow... though somewhat parallelized" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bdf = dask.dataframe.concat(dfs=list(dataset_dict.values()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "da.sel(crossingdates=\"2018.10.21\").h_li.unify_chunks().drop(labels=[\"longitude\", \"datetime\", \"cyclenumber\"]).hvplot(\n", + " kind=\"scatter\", x=\"latitude\", by=\"crossingdates\", datashade=True, dynspread=True,\n", + " width=800, height=500, dynamic=True, flip_xaxis=True, hover=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "# https://xarray.pydata.org/en/stable/combining.html#concatenate\n", + "# For all 6 lasers one one date ~~along one reference ground track~~,\n", + "# concatenate all points ~~from one dates~~ into one xr.Dataset\n", + "lasers = [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", + "da = xr.concat(\n", + " objs=(\n", + " catalog.icesat2atl06(laser=laser)\n", + " .to_dask()\n", + " #.sel(referencegroundtrack=referencegroundtrack)\n", + " for laser in lasers\n", + " ),\n", + " dim=pd.Index(data=lasers, name=\"laser\")\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Plot them points!" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# convert dask.dataframe to pd.DataFrame\n", + "df = df.compute()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.dropna(subset=[\"h_li\"]).query(expr=\"atl06_quality_summary == 0\").reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexdelta_timelaserlatitudelongitudeatl06_quality_summaryh_lih_li_sigmasegment_idsigma_geo_h
6402018-10-21 12:20:38.549732352gt3l-78.991839-147.1855830.0572.6864010.0156051443620.00.301177
8522018-10-21 12:20:38.552551576gt3l-78.992014-147.1857650.0572.7093510.0191221443621.00.300144
10642018-10-21 12:20:38.555369756gt3l-78.992189-147.1859470.0572.7644650.0229261443622.00.300081
12762018-10-21 12:20:38.558187524gt3l-78.992364-147.1861300.0572.8220830.0141141443623.00.303498
14882018-10-21 12:20:38.561005216gt3l-78.992538-147.1863130.0572.8342290.0188181443624.00.300117
.................................
1801924112593192019-10-19 18:59:56.522928840gt1r-79.163368-146.9523200.0544.2109990.0111271444515.00.312816
1801930112593552019-10-19 18:59:56.525754424gt1r-79.163543-146.9525070.0544.1467290.0114161444516.00.338958
1801936112593912019-10-19 18:59:56.528577288gt1r-79.163717-146.9526940.0544.0767820.0100851444517.00.322600
1801942112594272019-10-19 18:59:56.531397512gt1r-79.163892-146.9528810.0543.9666750.0097021444518.00.322036
1801948112594632019-10-19 18:59:56.534215528gt1r-79.164067-146.9530680.0543.8785400.0102631444519.00.314843
\n", + "

21482 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " index delta_time laser latitude longitude \\\n", + "6 40 2018-10-21 12:20:38.549732352 gt3l -78.991839 -147.185583 \n", + "8 52 2018-10-21 12:20:38.552551576 gt3l -78.992014 -147.185765 \n", + "10 64 2018-10-21 12:20:38.555369756 gt3l -78.992189 -147.185947 \n", + "12 76 2018-10-21 12:20:38.558187524 gt3l -78.992364 -147.186130 \n", + "14 88 2018-10-21 12:20:38.561005216 gt3l -78.992538 -147.186313 \n", + "... ... ... ... ... ... \n", + "1801924 11259319 2019-10-19 18:59:56.522928840 gt1r -79.163368 -146.952320 \n", + "1801930 11259355 2019-10-19 18:59:56.525754424 gt1r -79.163543 -146.952507 \n", + "1801936 11259391 2019-10-19 18:59:56.528577288 gt1r -79.163717 -146.952694 \n", + "1801942 11259427 2019-10-19 18:59:56.531397512 gt1r -79.163892 -146.952881 \n", + "1801948 11259463 2019-10-19 18:59:56.534215528 gt1r -79.164067 -146.953068 \n", + "\n", + " atl06_quality_summary h_li h_li_sigma segment_id \\\n", + "6 0.0 572.686401 0.015605 1443620.0 \n", + "8 0.0 572.709351 0.019122 1443621.0 \n", + "10 0.0 572.764465 0.022926 1443622.0 \n", + "12 0.0 572.822083 0.014114 1443623.0 \n", + "14 0.0 572.834229 0.018818 1443624.0 \n", + "... ... ... ... ... \n", + "1801924 0.0 544.210999 0.011127 1444515.0 \n", + "1801930 0.0 544.146729 0.011416 1444516.0 \n", + "1801936 0.0 544.076782 0.010085 1444517.0 \n", + "1801942 0.0 543.966675 0.009702 1444518.0 \n", + "1801948 0.0 543.878540 0.010263 1444519.0 \n", + "\n", + " sigma_geo_h \n", + "6 0.301177 \n", + "8 0.300144 \n", + "10 0.300081 \n", + "12 0.303498 \n", + "14 0.300117 \n", + "... ... \n", + "1801924 0.312816 \n", + "1801930 0.338958 \n", + "1801936 0.322600 \n", + "1801942 0.322036 \n", + "1801948 0.314843 \n", + "\n", + "[21482 rows x 10 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfs = df.query(expr=\"0 <= segment_id - 1443620 < 900\")\n", + "dfs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dfs.hvplot.scatter(\n", + " x=\"longitude\", y=\"latitude\", by=\"laser\", hover_cols=[\"delta_time\", \"segment_id\"],\n", + " #datashade=True, dynspread=True,\n", + " #width=800, height=500, colorbar=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "import pyproj" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "transformer = pyproj.Transformer.from_crs(crs_from=pyproj.CRS.from_epsg(4326), crs_to=pyproj.CRS.from_epsg(3031), always_xy=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dfs[\"x\"], dfs[\"y\"] = transformer.transform(xx=dfs.longitude.values, yy=dfs.latitude.values)\n" + ] + } + ], + "source": [ + "dfs[\"x\"], dfs[\"y\"] = transformer.transform(xx=dfs.longitude.values, yy=dfs.latitude.values)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexdelta_timelaserlatitudelongitudeatl06_quality_summaryh_lih_li_sigmasegment_idsigma_geo_hxy
6402018-10-21 12:20:38.549732352gt3l-78.991839-147.1855830.0572.6864010.0156051443620.00.301177-650091.025258-1.008187e+06
8522018-10-21 12:20:38.552551576gt3l-78.992014-147.1857650.0572.7093510.0191221443621.00.300144-650077.439154-1.008173e+06
10642018-10-21 12:20:38.555369756gt3l-78.992189-147.1859470.0572.7644650.0229261443622.00.300081-650063.849228-1.008159e+06
12762018-10-21 12:20:38.558187524gt3l-78.992364-147.1861300.0572.8220830.0141141443623.00.303498-650050.252182-1.008144e+06
14882018-10-21 12:20:38.561005216gt3l-78.992538-147.1863130.0572.8342290.0188181443624.00.300117-650036.644541-1.008130e+06
.......................................
1801924112593192019-10-19 18:59:56.522928840gt1r-79.163368-146.9523200.0544.2109990.0111271444515.00.312816-643937.547615-9.897727e+05
1801930112593552019-10-19 18:59:56.525754424gt1r-79.163543-146.9525070.0544.1467290.0114161444516.00.338958-643923.872690-9.897587e+05
1801936112593912019-10-19 18:59:56.528577288gt1r-79.163717-146.9526940.0544.0767820.0100851444517.00.322600-643910.196765-9.897448e+05
1801942112594272019-10-19 18:59:56.531397512gt1r-79.163892-146.9528810.0543.9666750.0097021444518.00.322036-643896.524591-9.897308e+05
1801948112594632019-10-19 18:59:56.534215528gt1r-79.164067-146.9530680.0543.8785400.0102631444519.00.314843-643882.859621-9.897169e+05
\n", + "

21482 rows × 12 columns

\n", + "
" + ], + "text/plain": [ + " index delta_time laser latitude longitude \\\n", + "6 40 2018-10-21 12:20:38.549732352 gt3l -78.991839 -147.185583 \n", + "8 52 2018-10-21 12:20:38.552551576 gt3l -78.992014 -147.185765 \n", + "10 64 2018-10-21 12:20:38.555369756 gt3l -78.992189 -147.185947 \n", + "12 76 2018-10-21 12:20:38.558187524 gt3l -78.992364 -147.186130 \n", + "14 88 2018-10-21 12:20:38.561005216 gt3l -78.992538 -147.186313 \n", + "... ... ... ... ... ... \n", + "1801924 11259319 2019-10-19 18:59:56.522928840 gt1r -79.163368 -146.952320 \n", + "1801930 11259355 2019-10-19 18:59:56.525754424 gt1r -79.163543 -146.952507 \n", + "1801936 11259391 2019-10-19 18:59:56.528577288 gt1r -79.163717 -146.952694 \n", + "1801942 11259427 2019-10-19 18:59:56.531397512 gt1r -79.163892 -146.952881 \n", + "1801948 11259463 2019-10-19 18:59:56.534215528 gt1r -79.164067 -146.953068 \n", + "\n", + " atl06_quality_summary h_li h_li_sigma segment_id \\\n", + "6 0.0 572.686401 0.015605 1443620.0 \n", + "8 0.0 572.709351 0.019122 1443621.0 \n", + "10 0.0 572.764465 0.022926 1443622.0 \n", + "12 0.0 572.822083 0.014114 1443623.0 \n", + "14 0.0 572.834229 0.018818 1443624.0 \n", + "... ... ... ... ... \n", + "1801924 0.0 544.210999 0.011127 1444515.0 \n", + "1801930 0.0 544.146729 0.011416 1444516.0 \n", + "1801936 0.0 544.076782 0.010085 1444517.0 \n", + "1801942 0.0 543.966675 0.009702 1444518.0 \n", + "1801948 0.0 543.878540 0.010263 1444519.0 \n", + "\n", + " sigma_geo_h x y \n", + "6 0.301177 -650091.025258 -1.008187e+06 \n", + "8 0.300144 -650077.439154 -1.008173e+06 \n", + "10 0.300081 -650063.849228 -1.008159e+06 \n", + "12 0.303498 -650050.252182 -1.008144e+06 \n", + "14 0.300117 -650036.644541 -1.008130e+06 \n", + "... ... ... ... \n", + "1801924 0.312816 -643937.547615 -9.897727e+05 \n", + "1801930 0.338958 -643923.872690 -9.897587e+05 \n", + "1801936 0.322600 -643910.196765 -9.897448e+05 \n", + "1801942 0.322036 -643896.524591 -9.897308e+05 \n", + "1801948 0.314843 -643882.859621 -9.897169e+05 \n", + "\n", + "[21482 rows x 12 columns]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dfs.hvplot.scatter(\n", + " x=\"x\", y=\"y\", by=\"laser\", hover_cols=[\"delta_time\", \"segment_id\", \"h_li\"],\n", + " #datashade=True, dynspread=True,\n", + " #width=800, height=500, colorbar=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dfs.hvplot.scatter(x=\"x\", y=\"h_li\", by=\"laser\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dfs.to_pickle(path=\"icesat2_sample.pkl\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Old making a DEM grid surface from points" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import scipy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# https://github.com/ICESAT-2HackWeek/gridding/blob/master/notebook/utils.py#L23\n", + "def make_grid(xmin, xmax, ymin, ymax, dx, dy):\n", + " \"\"\"Construct output grid-coordinates.\"\"\"\n", + " \n", + " # Setup grid dimensions\n", + " Nn = int((np.abs(ymax - ymin)) / dy) + 1\n", + " Ne = int((np.abs(xmax - xmin)) / dx) + 1\n", + " \n", + " # Initiate x/y vectors for grid\n", + " x_i = np.linspace(xmin, xmax, num=Ne)\n", + " y_i = np.linspace(ymin, ymax, num=Nn)\n", + " \n", + " return np.meshgrid(x_i, y_i)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "xi, yi = make_grid(xmin=dfs.x.min(), xmax=dfs.x.max(), ymin=dfs.y.max(), ymax=dfs.y.min(), dx=10, dy=10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ar = scipy.interpolate.griddata(points=(dfs.x, dfs.y), values=dfs.h_li, xi=(xi, yi))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(ar, extent=(dfs.x.min(), dfs.x.max(), dfs.y.min(), dfs.y.max()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "import plotly.express as px" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "px.scatter_3d(data_frame=dfs, x=\"longitude\", y=\"latitude\", z=\"h_li\", color=\"laser\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Play using XrViz\n", + "\n", + "Install the PyViz JupyterLab extension first using the [extension manager](https://jupyterlab.readthedocs.io/en/stable/user/extensions.html#using-the-extension-manager) or via the command below:\n", + "\n", + "```bash\n", + "jupyter labextension install @pyviz/jupyterlab_pyviz@v0.8.0 --no-build\n", + "jupyter labextension list # check to see that extension is installed\n", + "jupyter lab build --debug # build extension ??? with debug messages printed\n", + "```\n", + "\n", + "Note: Had to add `network-timeout 600000` to `.yarnrc` file to resolve university network issues." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import xrviz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "xrviz.example()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# https://xrviz.readthedocs.io/en/latest/set_initial_parameters.html\n", + "initial_params={\n", + " # Select variable to plot\n", + " \"Variables\": \"h_li\",\n", + " # Set coordinates\n", + " \"Set Coords\": [\"longitude\", \"latitude\"],\n", + " # Axes\n", + " \"x\": \"longitude\",\n", + " \"y\": \"latitude\",\n", + " #\"sigma\": \"animate\",\n", + " # Projection\n", + " #\"is_geo\": True,\n", + " #\"basemap\": True,\n", + " #\"crs\": \"PlateCarree\"\n", + "}\n", + "dashboard = xrviz.dashboard.Dashboard(data=dataset) #, initial_params=initial_params)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dashboard.panel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dashboard.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## OpenAltimetry" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"minx=-154.56678505984297&miny=-88.82881451427136&maxx=-125.17872921546498&maxy=-81.34051361301398&date=2019-05-02&trackId=516\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Paste the OpenAltimetry selection parameters here\n", + "OA_REFERENCE_URL = 'minx=-177.64275595145213&miny=-88.12014866942751&maxx=-128.25920892322736&maxy=-85.52394234080862&date=2019-05-02&trackId=515'\n", + "# We populate a list with the photon data using the OpenAltimetry API, no HDF! \n", + "OA_URL = 'https://openaltimetry.org/data/icesat2/getPhotonData?client=jupyter&' + OA_REFERENCE_URL\n", + "OA_PHOTONS = ['Noise', 'Low', 'Medium', 'High']\n", + "# OA_PLOTTED_BEAMS = [1,2,3,4,5,6] you can select up to 6 beams for each ground track.\n", + "# Some beams may not be usable due cloud covering or QC issues.\n", + "OA_BEAMS = [3,4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "minx, miny, maxx, maxy = [-156, -88, -127, -84]\n", + "date = \"2019-05-02\" # UTC date?\n", + "track = 515 # \n", + "beam = 1 # 1 to 6\n", + "params = {\"client\": \"jupyter\", \"minx\": minx, \"miny\": miny, \"maxx\": maxx, \"maxy\": maxy, \"date\": date, \"trackId\": str(track), \"beam\": str(beam)}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r = requests.get(url=\"https://openaltimetry.org/data/icesat2/getPhotonData\", params=params)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# OpenAltimetry Data cleansing\n", + "df = pd.io.json.json_normalize(data=r.json()[\"series\"], meta=\"name\", record_path=\"data\")\n", + "df.name = df.name.str.split().str.get(0) # Get e.g. just \"Low\" instead of \"Low [12345]\"\n", + "df.query(expr=\"name in ('Low', 'Medium', 'High')\", inplace=True) # filter out Noise and Buffer points\n", + "\n", + "df.rename(columns={0: \"latitude\", 1: \"elevation\", 2: \"longitude\"}, inplace=True)\n", + "df = df.reindex(columns=[\"longitude\", \"latitude\", \"elevation\", \"name\"]) # reorder columns\n", + "df.reset_index(inplace=True)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.hvplot.scatter(x=\"latitude\", y=\"elevation\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "deepicedrain", + "language": "python", + "name": "deepicedrain" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.2" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 3f8c4654b7ef85069b4c7a888f2e4e349b4a596c Mon Sep 17 00:00:00 2001 From: Wei Ji Date: Thu, 7 May 2020 22:03:44 +1200 Subject: [PATCH 2/5] :sparkles: Generic ICESat2 ATLAS Downloader for Antarctica About a quarter of the way through downloading ~2TB? of ICESat2 ATL06 version 3 data all over Antarctica, but let's introduce icesat2atlasdownloader first shall we? This baby allows us to download any ICESat-2/ATLAS product, for any given date, hardcoded to Orbital Segments 10, 11, 12 (i.e. Antarctica), and oh yeah, it does so by 'caching' the remote data locally using intake/fsspec. Tie that up with a highly parallelized dask task scheduler, complete with tqdm progress bars, and I'll just need to sit back and wait until everything is downloaded next morning. Again, this code was worked on pre-covid19, but there were issues with the intake cache mechanism back then. You won't know it, but changing from using intake-specific cache (that is deprecated, messy, and puts a unconfigurable 'hash' in the filepath, though with nice dask parallelization abilities) to fsspec-specific 'simplecache' (more configurable, no hash in filepath, though it requires writing own parallelization code) is a delight! It enables us to download a list of orbital segments (10, 11, 12) instead of just 11 before. Download is parallelized using dask futures, with progress tracked using tqdm (or in the dask dashboard). The main difference between icesat2atlasdownloader and icesat2atl06 is that the former doesn't read into the laser group but the latter does (and is prone to pandas IndexErrors from duplicated index dates). With version 3 of ATL06, the max date has gone from 2019.11.15 to 2020.03.06, or about 1 cycle more. Changes documented at https://nsidc.org/data/atl06/versions/3. They seem to have removed some noisy points it seems, will need to do some Exploratory Data Analysis after downloads are done. --- atl06_play.ipynb | 104 +++++++++++++++++++++++++++++------------------ catalog.yaml | 61 +++++++++++++++++++++++---- 2 files changed, 117 insertions(+), 48 deletions(-) diff --git a/atl06_play.ipynb b/atl06_play.ipynb index e590e70..c2b812c 100644 --- a/atl06_play.ipynb +++ b/atl06_play.ipynb @@ -62,15 +62,15 @@ "\n", "

Client

\n", "\n", "\n", "\n", "

Cluster

\n", "
    \n", - "
  • Workers: 7
  • \n", - "
  • Cores: 7
  • \n", + "
  • Workers: 10
  • \n", + "
  • Cores: 10
  • \n", "
  • Memory: 201.22 GB
  • \n", "
\n", "\n", @@ -78,7 +78,7 @@ "" ], "text/plain": [ - "" + "" ] }, "execution_count": 2, @@ -96,7 +96,7 @@ "# Limit compute to 8 cores for download part using intake\n", "# Can possibly go up to 10 because there are 10 DPs?\n", "# See https://n5eil02u.ecs.nsidc.org/opendap/hyrax/catalog.xml\n", - "client = dask.distributed.Client(n_workers=7, threads_per_worker=1)\n", + "client = dask.distributed.Client(n_workers=10, threads_per_worker=1)\n", "client" ] }, @@ -462,7 +462,7 @@ " stroke: currentColor;\n", " fill: currentColor;\n", "}\n", - "
xarray.Dataset
    • delta_time: 907915
    • referencegroundtrack: 10
    • orbitalsegment
      ()
      <U2
      '11'
      array('11', dtype='<U2')
    • revision
      ()
      <U2
      '01'
      array('01', dtype='<U2')
    • version
      ()
      <U3
      '002'
      array('002', dtype='<U3')
    • cyclenumber
      ()
      <U2
      '03'
      array('03', dtype='<U2')
    • longitude
      (delta_time)
      float64
      dask.array<chunksize=(50000,), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Longitude of segment center, , WGS84, East=+
      long_name :
      Longitude
      source :
      section 3.10
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      \n",
      +       "
      xarray.Dataset
        • delta_time: 907899
        • referencegroundtrack: 10
        • cyclenumber
          ()
          <U2
          '03'
          array('03', dtype='<U2')
        • longitude
          (delta_time)
          float64
          dask.array<chunksize=(50000,), meta=np.ndarray>
          contentType :
          physicalMeasurement
          description :
          Longitude of segment center, , WGS84, East=+
          long_name :
          Longitude
          source :
          section 3.10
          standard_name :
          longitude
          units :
          degrees_east
          valid_max :
          180.0
          valid_min :
          -180.0
      \n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
      \n", "\n", @@ -471,7 +471,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -506,12 +506,12 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 1\n", "\n", "\n", "\n", - "
      Bytes 7.26 MB 622.86 kB
      Shape (907915,) (77857,)
      Shape (907899,) (77857,)
      Count 701 Tasks 15 Chunks
      Type float64 numpy.ndarray
    • latitude
      (delta_time)
      float64
      dask.array<chunksize=(50000,), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Latitude of segment center, WGS84, North=+,
      long_name :
      Latitude
      source :
      section 3.10
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      \n",
      +       "
    • revision
      ()
      <U2
      '01'
      array('01', dtype='<U2')
    • version
      ()
      <U3
      '003'
      array('003', dtype='<U3')
    • orbitalsegment
      ()
      <U2
      '11'
      array('11', dtype='<U2')
    • latitude
      (delta_time)
      float64
      dask.array<chunksize=(50000,), meta=np.ndarray>
      contentType :
      physicalMeasurement
      description :
      Latitude of segment center, WGS84, North=+,
      long_name :
      Latitude
      source :
      section 3.10
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      \n",
              "\n",
              "
      \n", "\n", @@ -520,7 +520,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -555,21 +555,21 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 1\n", "\n", "\n", "\n", - "
      Bytes 7.26 MB 622.86 kB
      Shape (907915,) (77857,)
      Shape (907899,) (77857,)
      Count 701 Tasks 15 Chunks
      Type float64 numpy.ndarray
    • delta_time
      (delta_time)
      datetime64[ns]
      2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984
      contentType :
      referenceInformation
      description :
      Number of GPS seconds since the ATLAS SDP epoch. The ATLAS Standard Data Products (SDP) epoch offset is defined within /ancillary_data/atlas_sdp_gps_epoch as the number of GPS seconds between the GPS epoch (1980-01-06T00:00:00.000000Z UTC) and the ATLAS SDP epoch. By adding the offset contained within atlas_sdp_gps_epoch to delta time parameters, the time in gps_seconds relative to the GPS epoch can be computed.
      long_name :
      Elapsed GPS seconds
      source :
      section 4.4
      standard_name :
      time
      array(['2019-06-26T00:35:41.688893288', '2019-06-26T00:35:41.936968728',\n",
      -       "       '2019-06-26T00:35:44.974930184', ..., '2019-06-26T14:49:54.854559880',\n",
      +       "
    • delta_time
      (delta_time)
      datetime64[ns]
      2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984
      contentType :
      referenceInformation
      description :
      Number of GPS seconds since the ATLAS SDP epoch. The ATLAS Standard Data Products (SDP) epoch offset is defined within /ancillary_data/atlas_sdp_gps_epoch as the number of GPS seconds between the GPS epoch (1980-01-06T00:00:00.000000Z UTC) and the ATLAS SDP epoch. By adding the offset contained within atlas_sdp_gps_epoch to delta time parameters, the time in gps_seconds relative to the GPS epoch can be computed.
      long_name :
      Elapsed GPS seconds
      source :
      section 4.4
      standard_name :
      time
      array(['2019-06-26T00:35:41.688893288', '2019-06-26T00:35:41.936968728',\n",
      +       "       '2019-06-26T00:36:25.364432688', ..., '2019-06-26T14:49:54.854559880',\n",
              "       '2019-06-26T14:49:54.975701040', '2019-06-26T14:49:55.015304984'],\n",
      -       "      dtype='datetime64[ns]')
    • datetime
      (referencegroundtrack)
      datetime64[ns]
      2019-06-26T00:35:36 ... 2019-06-26T14:44:13
      array(['2019-06-26T00:35:36.000000000', '2019-06-26T02:09:54.000000000',\n",
      +       "      dtype='datetime64[ns]')
    • datetime
      (referencegroundtrack)
      datetime64[ns]
      2019-06-26T00:35:36 ... 2019-06-26T14:44:13
      array(['2019-06-26T00:35:36.000000000', '2019-06-26T02:09:54.000000000',\n",
              "       '2019-06-26T03:44:11.000000000', '2019-06-26T05:18:28.000000000',\n",
              "       '2019-06-26T06:52:45.000000000', '2019-06-26T08:27:03.000000000',\n",
              "       '2019-06-26T10:01:20.000000000', '2019-06-26T11:35:38.000000000',\n",
              "       '2019-06-26T13:09:55.000000000', '2019-06-26T14:44:13.000000000'],\n",
      -       "      dtype='datetime64[ns]')
    • referencegroundtrack
      (referencegroundtrack)
      object
      '1355' '1356' ... '1363' '1364'
      array(['1355', '1356', '1357', '1358', '1359', '1360', '1361', '1362', '1363',\n",
      -       "       '1364'], dtype=object)
      • atl06_quality_summary
        (referencegroundtrack, delta_time)
        float64
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        qualityInformation
        description :
        The ATL06_quality_summary parameter indicates the best-quality subset of all ATL06 data. A zero in this parameter implies that no data-quality tests have found a problem with the segment, a one implies that some potential problem has been found. Users who select only segments with zero values for this flag can be relatively certain of obtaining high-quality data, but will likely miss a significant fraction of usable data, particularly in cloudy, rough, or low-surface-reflectance conditions.
        flag_meanings :
        best_quality potential_problem
        flag_values :
        [0 1]
        long_name :
        ATL06_Quality_Summary
        source :
        section 4.3
        units :
        1
        valid_max :
        1
        valid_min :
        0
        \n",
        +       "      dtype='datetime64[ns]')
      • referencegroundtrack
        (referencegroundtrack)
        object
        '1355' '1356' ... '1363' '1364'
        array(['1355', '1356', '1357', '1358', '1359', '1360', '1361', '1362', '1363',\n",
        +       "       '1364'], dtype=object)
        • atl06_quality_summary
          (referencegroundtrack, delta_time)
          float64
          dask.array<chunksize=(1, 50000), meta=np.ndarray>
          contentType :
          qualityInformation
          description :
          The ATL06_quality_summary parameter indicates the best-quality subset of all ATL06 data. A zero in this parameter implies that no data-quality tests have found a problem with the segment, a one implies that some potential problem has been found. Users who select only segments with zero values for this flag can be relatively certain of obtaining high-quality data, but will likely miss a significant fraction of usable data, particularly in cloudy, rough, or low-surface-reflectance conditions.
          flag_meanings :
          best_quality potential_problem
          flag_values :
          [0 1]
          long_name :
          ATL06_Quality_Summary
          source :
          section 4.3
          units :
          1
          valid_max :
          1
          valid_min :
          0
      • \n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
        \n", "\n", @@ -578,7 +578,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -622,12 +622,12 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 10\n", "\n", "\n", "\n", - "
        Bytes 72.63 MB 622.86 kB
        Shape (10, 907915) (1, 77857)
        Shape (10, 907899) (1, 77857)
        Count 656 Tasks 150 Chunks
        Type float64 numpy.ndarray
      • h_li
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        physicalMeasurement
        description :
        Standard land-ice segment height determined by land ice algorithm, corrected for first-photon bias, representing the median- based height of the selected PEs
        long_name :
        Land Ice height
        source :
        section 4.4
        units :
        meters
        \n",
        +       "
      • h_li
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        physicalMeasurement
        description :
        Standard land-ice segment height determined by land ice algorithm, corrected for first-photon bias, representing the median- based height of the selected PEs
        long_name :
        Land Ice height
        source :
        section 4.4
        units :
        meters
        \n",
                "\n",
                "\n",
                "    \n",
                "    \n",
        -       "      \n",
        +       "      \n",
                "      \n",
                "      \n",
                "      \n",
        @@ -1341,7 +1420,7 @@
                "      \n",
                "    \n",
                "    \n",
        -       "      \n",
        +       "      \n",
                "      \n",
                "      \n",
                "      \n",
        @@ -1365,7 +1444,7 @@
                "      \n",
                "    \n",
                "    \n",
        -       "      \n",
        +       "      \n",
                "      \n",
                "      \n",
                "      \n",
        @@ -1377,7 +1456,7 @@
                "      \n",
                "    \n",
                "    \n",
        -       "      \n",
        +       "      \n",
                "      \n",
                "      \n",
                "      \n",
        @@ -1391,21 +1470,21 @@
                "  \n",
                "
        \n", "\n", @@ -636,7 +636,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -680,12 +680,12 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 10\n", "\n", "\n", "\n", - "
        Bytes 36.32 MB 311.43 kB
        Shape (10, 907915) (1, 77857)
        Shape (10, 907899) (1, 77857)
        Count 632 Tasks 150 Chunks
        Type float32 numpy.ndarray
      • h_li_sigma
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        qualityInformation
        description :
        Propagated error due to sampling error and FPB correction from the land ice algorithm
        long_name :
        Expected RMS segment misfit
        source :
        section 4.4
        units :
        meters
        \n",
        +       "
      • h_li_sigma
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        qualityInformation
        description :
        Propagated error due to sampling error and FPB correction from the land ice algorithm
        long_name :
        Expected RMS segment misfit
        source :
        section 4.4
        units :
        meters
        \n",
                "\n",
                "
        \n", "\n", @@ -694,7 +694,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -738,12 +738,12 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 10\n", "\n", "\n", "\n", - "
        Bytes 36.32 MB 311.43 kB
        Shape (10, 907915) (1, 77857)
        Shape (10, 907899) (1, 77857)
        Count 632 Tasks 150 Chunks
        Type float32 numpy.ndarray
      • segment_id
        (referencegroundtrack, delta_time)
        float64
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        referenceInformation
        description :
        Segment number, counting from the equator. Equal to the segment_id for the second of the two 20m ATL03 segments included in the 40m ATL06 segment
        long_name :
        Reference Point, m
        source :
        section 3.1.2.1
        units :
        1
        \n",
        +       "
      • segment_id
        (referencegroundtrack, delta_time)
        float64
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        referenceInformation
        description :
        Segment number, counting from the equator. Equal to the segment_id for the second of the two 20m ATL03 segments included in the 40m ATL06 segment
        long_name :
        Reference Point, m
        source :
        section 3.1.2.1
        units :
        1
        \n",
                "\n",
                "\n",
        @@ -78,7 +80,7 @@
                "
        \n", "\n", @@ -752,7 +752,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -796,12 +796,12 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 10\n", "\n", "\n", "\n", - "
        Bytes 72.63 MB 622.86 kB
        Shape (10, 907915) (1, 77857)
        Shape (10, 907899) (1, 77857)
        Count 632 Tasks 150 Chunks
        Type float64 numpy.ndarray
      • sigma_geo_h
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        qualityInformation
        description :
        Total vertical geolocation error due to PPD and POD, including the effects of horizontal geolocation error on the segment vertical error.
        long_name :
        Vertical Geolocation Error
        source :
        section 3.10
        units :
        meters
        \n",
        +       "
      • sigma_geo_h
        (referencegroundtrack, delta_time)
        float32
        dask.array<chunksize=(1, 50000), meta=np.ndarray>
        contentType :
        qualityInformation
        description :
        Total vertical geolocation error due to PPD and POD, including the effects of horizontal geolocation error on the segment vertical error.
        long_name :
        Vertical Geolocation Error
        source :
        ATBD Section 3.10
        units :
        meters
        \n",
                "\n",
                "
        \n", "\n", @@ -810,7 +810,7 @@ " \n", " \n", " \n", - " \n", + " \n", " \n", " \n", " \n", @@ -854,22 +854,22 @@ " \n", "\n", " \n", - " 907915\n", + " 907899\n", " 10\n", "\n", "\n", "\n", - "
        Bytes 36.32 MB 311.43 kB
        Shape (10, 907915) (1, 77857)
        Shape (10, 907899) (1, 77857)
        Count 632 Tasks 150 Chunks
        Type float32 numpy.ndarray
      • Description :
        The land_ice_height group contains the primary set of derived ATL06 products. This includes geolocation, height, and standard error and quality measures for each segment. This group is sparse, meaning that parameters are provided only for pairs of segments for which at least one beam has a valid surface-height measurement.
        data_rate :
        Data within this group are sparse. Data values are provided only for those ICESat-2 20m segments where at least one beam has a valid land ice height measurement.
      • " + "
      • Description :
        The land_ice_height group contains the primary set of derived ATL06 products. This includes geolocation, height, and standard error and quality measures for each segment. This group is sparse, meaning that parameters are provided only for pairs of segments for which at least one beam has a valid surface-height measurement.
        data_rate :
        Data within this group are sparse. Data values are provided only for those ICESat-2 20m segments where at least one beam has a valid land ice height measurement.
      • " ], "text/plain": [ "\n", - "Dimensions: (delta_time: 907915, referencegroundtrack: 10)\n", + "Dimensions: (delta_time: 907899, referencegroundtrack: 10)\n", "Coordinates:\n", - " orbitalsegment \n", + " revision \n", " * delta_time (delta_time) datetime64[ns] 2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984\n", " datetime (referencegroundtrack) datetime64[ns] 2019-06-26T00:35:36 ... 2019-06-26T14:44:13\n", @@ -928,14 +928,28 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Download all ICESAT2 ATLAS hdf files from start to end date\n", "dates1 = pd.date_range(start=\"2018.10.14\", end=\"2019.06.26\") # 1st batch\n", - "dates2 = pd.date_range(start=\"2019.07.26\", end=\"2019.11.15\") # 2nd batch\n", - "dates = dates1.append(dates2)[:2]" + "dates2 = pd.date_range(start=\"2019.07.26\", end=\"2020.03.06\") # 2nd batch\n", + "dates = dates1.append(dates2)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Submit download jobs to Client\n", + "futures = []\n", + "for date in dates:\n", + " source = catalog.icesat2atlasdownloader(date=date)\n", + " future = client.submit(func=source.discover) # triggers download of the file(s), or loads from cache\n", + " futures.append(future)" ] }, { @@ -947,10 +961,20 @@ "name": "stderr", "output_type": "stream", "text": [ - " 0%| | 0/2 [00:00 Date: Fri, 8 May 2020 14:01:29 +1200 Subject: [PATCH 3/5] :bug: Handle missing ATL06.003 data on 2019.12.09 Not sure why that 2019.12.09 date is missing in ATL06 version 3, it was in version 2! Doing some error management, and ensure that we check all downloads are completed. --- atl06_play.ipynb | 66 +++++++++++++++++++++++++++++++++++++++++++----- catalog.yaml | 2 +- 2 files changed, 61 insertions(+), 7 deletions(-) diff --git a/atl06_play.ipynb b/atl06_play.ipynb index c2b812c..d6f7443 100644 --- a/atl06_play.ipynb +++ b/atl06_play.ipynb @@ -933,9 +933,10 @@ "outputs": [], "source": [ "# Download all ICESAT2 ATLAS hdf files from start to end date\n", - "dates1 = pd.date_range(start=\"2018.10.14\", end=\"2019.06.26\") # 1st batch\n", - "dates2 = pd.date_range(start=\"2019.07.26\", end=\"2020.03.06\") # 2nd batch\n", - "dates = dates1.append(dates2)" + "dates1 = pd.date_range(start=\"2018.10.14\", end=\"2018.12.08\") # 1st batch\n", + "dates2 = pd.date_range(start=\"2018.12.10\", end=\"2019.06.26\") # 2nd batch\n", + "dates3 = pd.date_range(start=\"2019.07.26\", end=\"2020.03.06\") # 3rd batch\n", + "dates = dates1.append(other=dates2).append(other=dates3)" ] }, { @@ -948,20 +949,40 @@ "futures = []\n", "for date in dates:\n", " source = catalog.icesat2atlasdownloader(date=date)\n", - " future = client.submit(func=source.discover) # triggers download of the file(s), or loads from cache\n", + " future = client.submit(\n", + " func=source.discover,\n", + " key=f\"download-{date}\",\n", + " ) # triggers download of the file(s), or loads from cache\n", " futures.append(future)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - " 25%|██▌ | 121/481 [4:33:33<20:49:30, 208.25s/it]" + " 88%|████████▊ | 421/481 [1:46:28<15:10, 15.17s/it] \n" + ] + }, + { + "ename": "OSError", + "evalue": "no files to open", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mresponses\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtqdm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtqdm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdistributed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_completed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtotal\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mresponses\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/distributed/client.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 214\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"error\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 215\u001b[0m \u001b[0mtyp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 216\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 217\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"cancelled\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/intake/source/base.py\u001b[0m in \u001b[0;36mdiscover\u001b[0;34m()\u001b[0m\n\u001b[1;32m 167\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdiscover\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 168\u001b[0m \u001b[0;34m\"\"\"Open resource and populate the source attributes.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 169\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_load_metadata\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 170\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 171\u001b[0m return dict(datashape=self.datashape,\n", + "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/intake/source/base.py\u001b[0m in \u001b[0;36m_load_metadata\u001b[0;34m()\u001b[0m\n\u001b[1;32m 115\u001b[0m \u001b[0;34m\"\"\"load metadata only if needed\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 117\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_schema\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 118\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatashape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatashape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 119\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/leongwei1/miniconda3/envs/deepicedrain/src/intake-xarray/intake_xarray/base.py\u001b[0m in \u001b[0;36m_get_schema\u001b[0;34m()\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ds\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_open_dataset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m metadata = {\n", + "\u001b[0;32m/home/leongwei1/miniconda3/envs/deepicedrain/src/intake-xarray/intake_xarray/netcdf.py\u001b[0m in \u001b[0;36m_open_dataset\u001b[0;34m()\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0murl\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfsspec\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen_local\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstorage_options\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 81\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ds\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_open_dataset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mchunks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchunks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 82\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_add_path_to_ds\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/xarray/backends/api.py\u001b[0m in \u001b[0;36mopen_mfdataset\u001b[0;34m()\u001b[0m\n\u001b[1;32m 876\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 877\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mpaths\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 878\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mOSError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"no files to open\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 879\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 880\u001b[0m \u001b[0;31m# If combine='by_coords' then this is unnecessary, but quick.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mOSError\u001b[0m: no files to open" ] } ], @@ -972,6 +993,39 @@ " responses.append(f.result())" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# In case of error, check which downloads are unfinished\n", + "# Manually delete those folders and retry\n", + "unfinished = []\n", + "for foo in futures:\n", + " if foo.status != \"finished\":\n", + " print(foo)\n", + " unfinished.append(foo)\n", + " # foo.retry()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " assert len(unfinished) == 0\n", + "except AssertionError:\n", + " for task in unfinished:\n", + " print(task)\n", + " raise ValueError(\n", + " f\"{len(unfinished)} download tasks are unfinished,\"\n", + " \" please delete those folders and retry again!\"\n", + " )" + ] + }, { "cell_type": "raw", "metadata": {}, diff --git a/catalog.yaml b/catalog.yaml index 5baa275..e3d4b6e 100644 --- a/catalog.yaml +++ b/catalog.yaml @@ -65,7 +65,7 @@ sources: type: datetime default: 2019.06.26 min: 2018.10.14 - max: 2020.03.06 # note gap from 2019.06.27 to 2019.07.25 (inclusive) + max: 2020.03.06 # note missing 2018.12.09, and gap from 2019.06.27 to 2019.07.25 (inclusive) orbitalsegment: description: Orbital Segment type: str From 2ffdf3695379876dcb05510b541dbe4d3a4612e5 Mon Sep 17 00:00:00 2001 From: Wei Ji Date: Fri, 8 May 2020 14:57:41 +1200 Subject: [PATCH 4/5] :art: Pair notebook with .py script and lint code with black Pair up the jupyter notebook with a .py script, and lint it with black. Nicer to look at and easier to diff! --- atl06_play.ipynb | 208 +++++++++++------ atl06_play.py | 576 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 719 insertions(+), 65 deletions(-) create mode 100644 atl06_play.py diff --git a/atl06_play.ipynb b/atl06_play.ipynb index d6f7443..d73d68a 100644 --- a/atl06_play.ipynb +++ b/atl06_play.ipynb @@ -117,7 +117,9 @@ "metadata": {}, "outputs": [], "source": [ - "catalog = intake.open_catalog(uri=\"catalog.yaml\") # open the local catalog file containing ICESAT2 stuff" + "catalog = intake.open_catalog(\n", + " uri=\"catalog.yaml\"\n", + ") # open the local catalog file containing ICESAT2 stuff" ] }, { @@ -894,12 +896,16 @@ "try:\n", " netrc.netrc()\n", "except FileNotFoundError as error_msg:\n", - " print(f\"{error_msg}, please follow instructions to create one at \"\n", - " \"https://nsidc.org/support/faq/what-options-are-available-bulk-downloading-data-https-earthdata-login-enabled \"\n", - " 'basically using `echo \"machine urs.earthdata.nasa.gov login password \" >> ~/.netrc`')\n", + " print(\n", + " f\"{error_msg}, please follow instructions to create one at \"\n", + " \"https://nsidc.org/support/faq/what-options-are-available-bulk-downloading-data-https-earthdata-login-enabled \"\n", + " 'basically using `echo \"machine urs.earthdata.nasa.gov login password \" >> ~/.netrc`'\n", + " )\n", " raise\n", "\n", - "dataset = catalog.icesat2atl06.to_dask().unify_chunks() # depends on .netrc file in home folder\n", + "dataset = (\n", + " catalog.icesat2atl06.to_dask().unify_chunks()\n", + ") # depends on .netrc file in home folder\n", "dataset" ] }, @@ -909,10 +915,10 @@ "metadata": {}, "outputs": [], "source": [ - "#dataset.hvplot.points(\n", + "# dataset.hvplot.points(\n", "# x=\"longitude\", y=\"latitude\", datashade=True, width=800, height=500, hover=True,\n", "# #geo=True, coastline=True, crs=cartopy.crs.PlateCarree(), #projection=cartopy.crs.Stereographic(central_latitude=-71),\n", - "#)\n", + "# )\n", "catalog.icesat2atl06.hvplot.quickview()" ] }, @@ -950,8 +956,7 @@ "for date in dates:\n", " source = catalog.icesat2atlasdownloader(date=date)\n", " future = client.submit(\n", - " func=source.discover,\n", - " key=f\"download-{date}\",\n", + " func=source.discover, key=f\"download-{date}\",\n", " ) # triggers download of the file(s), or loads from cache\n", " futures.append(future)" ] @@ -989,7 +994,9 @@ "source": [ "# Check download progress here, https://stackoverflow.com/a/37901797/6611055\n", "responses = []\n", - "for f in tqdm.tqdm(iterable=dask.distributed.as_completed(futures=futures), total=len(futures)):\n", + "for f in tqdm.tqdm(\n", + " iterable=dask.distributed.as_completed(futures=futures), total=len(futures)\n", + "):\n", " responses.append(f.result())" ] }, @@ -1080,28 +1087,34 @@ { "cell_type": "code", "execution_count": 7, - "metadata": {}, + "metadata": { + "lines_to_next_cell": 1 + }, "outputs": [], "source": [ - "dataset = catalog.icesat2atl06.to_dask() # unfortunately, we have to load this in dask to get the path...\n", + "dataset = (\n", + " catalog.icesat2atl06.to_dask()\n", + ") # unfortunately, we have to load this in dask to get the path...\n", "root_directory = os.path.dirname(os.path.dirname(dataset.encoding[\"source\"]))" ] }, { "cell_type": "code", "execution_count": 8, - "metadata": {}, + "metadata": { + "lines_to_next_cell": 2 + }, "outputs": [], "source": [ "def get_crossing_dates(\n", " catalog_entry: intake.catalog.local.LocalCatalogEntry,\n", " root_directory: str,\n", - " referencegroundtrack: str=\"????\",\n", + " referencegroundtrack: str = \"????\",\n", " datetime=\"*\",\n", " cyclenumber=\"??\",\n", " orbitalsegment=\"??\",\n", " version=\"002\",\n", - " revision=\"01\"\n", + " revision=\"01\",\n", "):\n", " \"\"\"\n", " Given a 4-digit reference groundtrack (e.g. 1234),\n", @@ -1109,23 +1122,28 @@ " key is the date in \"YYYY.MM.DD\" format when an ICESAT2 crossing was made and the\n", " value is the filepath to the HDF5 data file.\n", " \"\"\"\n", - " \n", + "\n", " # Get a glob string that looks like \"ATL06_??????????????_XXXX????_002_01.h5\"\n", " globpath = catalog_entry.path_as_pattern\n", " if datetime == \"*\":\n", " globpath = globpath.replace(\"{datetime:%Y%m%d%H%M%S}\", \"??????????????\")\n", " globpath = globpath.format(\n", - " referencegroundtrack=referencegroundtrack, cyclenumber=cyclenumber, orbitalsegment=orbitalsegment,\n", - " version=version, revision=revision\n", + " referencegroundtrack=referencegroundtrack,\n", + " cyclenumber=cyclenumber,\n", + " orbitalsegment=orbitalsegment,\n", + " version=version,\n", + " revision=revision,\n", " )\n", - " \n", + "\n", " # Get list of filepaths (dates are contained in the filepath)\n", " globedpaths = glob.glob(os.path.join(root_directory, \"??????????\", globpath))\n", - " \n", + "\n", " # Pick out just the dates in \"YYYY.MM.DD\" format from the globedpaths\n", " # crossingdates = [os.path.basename(os.path.dirname(p=p)) for p in globedpaths]\n", - " crossingdates = {os.path.basename(os.path.dirname(p=p)): p for p in sorted(globedpaths)}\n", - " \n", + " crossingdates = {\n", + " os.path.basename(os.path.dirname(p=p)): p for p in sorted(globedpaths)\n", + " }\n", + "\n", " return crossingdates" ] }, @@ -1136,10 +1154,12 @@ "outputs": [], "source": [ "crossing_dates_dict = {}\n", - "for rgt in range(0,1388): # ReferenceGroundTrack goes from 0001 to 1387\n", + "for rgt in range(0, 1388): # ReferenceGroundTrack goes from 0001 to 1387\n", " referencegroundtrack = f\"{rgt}\".zfill(4)\n", " crossing_dates = dask.delayed(get_crossing_dates)(\n", - " catalog_entry=catalog.icesat2atl06, root_directory=root_directory, referencegroundtrack=referencegroundtrack\n", + " catalog_entry=catalog.icesat2atl06,\n", + " root_directory=root_directory,\n", + " referencegroundtrack=referencegroundtrack,\n", " )\n", " crossing_dates_dict[referencegroundtrack] = crossing_dates\n", "crossing_dates_dict = dask.compute(crossing_dates_dict)[0]" @@ -1201,7 +1221,7 @@ " concatenate all points from all crossing dates into one xr.Dataset\n", " \"\"\"\n", " lasers = [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", - " \n", + "\n", " objs = [\n", " xr.open_mfdataset(\n", " paths=crossing_dates.values(),\n", @@ -1213,13 +1233,17 @@ " ).assign_coords(coords={\"laser\": laser})\n", " for laser in lasers\n", " ]\n", - " \n", + "\n", " try:\n", - " da = xr.concat(objs=objs, dim=\"laser\") # dim=pd.Index(data=lasers, name=\"laser\")\n", + " da = xr.concat(\n", + " objs=objs, dim=\"laser\"\n", + " ) # dim=pd.Index(data=lasers, name=\"laser\")\n", " df = da.unify_chunks().to_dask_dataframe()\n", " except ValueError:\n", " # ValueError: cannot reindex or align along dimension 'delta_time' because the index has duplicate values\n", - " df = dask.dataframe.concat([obj.unify_chunks().to_dask_dataframe() for obj in objs])\n", + " df = dask.dataframe.concat(\n", + " [obj.unify_chunks().to_dask_dataframe() for obj in objs]\n", + " )\n", "\n", " return df" ] @@ -1231,8 +1255,10 @@ "outputs": [], "source": [ "dataset_dict = {}\n", - "#for referencegroundtrack in list(crossing_dates_dict)[349:350]: # ReferenceGroundTrack goes from 0001 to 1387\n", - "for referencegroundtrack in list(crossing_dates_dict)[340:350]: # ReferenceGroundTrack goes from 0001 to 1387\n", + "# for referencegroundtrack in list(crossing_dates_dict)[349:350]: # ReferenceGroundTrack goes from 0001 to 1387\n", + "for referencegroundtrack in list(crossing_dates_dict)[\n", + " 340:350\n", + "]: # ReferenceGroundTrack goes from 0001 to 1387\n", " # print(referencegroundtrack)\n", " if len(crossing_dates_dict[referencegroundtrack]) > 0:\n", " da = dask.delayed(six_laser_beams)(\n", @@ -1401,7 +1427,9 @@ "metadata": {}, "outputs": [], "source": [ - "dataset_dict = dask.compute(dataset_dict)[0] # compute every referencegroundtrack, slow... though somewhat parallelized" + "dataset_dict = dask.compute(dataset_dict)[\n", + " 0\n", + "] # compute every referencegroundtrack, slow... though somewhat parallelized" ] }, { @@ -1426,9 +1454,19 @@ "metadata": {}, "outputs": [], "source": [ - "da.sel(crossingdates=\"2018.10.21\").h_li.unify_chunks().drop(labels=[\"longitude\", \"datetime\", \"cyclenumber\"]).hvplot(\n", - " kind=\"scatter\", x=\"latitude\", by=\"crossingdates\", datashade=True, dynspread=True,\n", - " width=800, height=500, dynamic=True, flip_xaxis=True, hover=True\n", + "da.sel(crossingdates=\"2018.10.21\").h_li.unify_chunks().drop(\n", + " labels=[\"longitude\", \"datetime\", \"cyclenumber\"]\n", + ").hvplot(\n", + " kind=\"scatter\",\n", + " x=\"latitude\",\n", + " by=\"crossingdates\",\n", + " datashade=True,\n", + " dynspread=True,\n", + " width=800,\n", + " height=500,\n", + " dynamic=True,\n", + " flip_xaxis=True,\n", + " hover=True,\n", ")" ] }, @@ -1738,9 +1776,12 @@ "outputs": [], "source": [ "dfs.hvplot.scatter(\n", - " x=\"longitude\", y=\"latitude\", by=\"laser\", hover_cols=[\"delta_time\", \"segment_id\"],\n", - " #datashade=True, dynspread=True,\n", - " #width=800, height=500, colorbar=True\n", + " x=\"longitude\",\n", + " y=\"latitude\",\n", + " by=\"laser\",\n", + " hover_cols=[\"delta_time\", \"segment_id\"],\n", + " # datashade=True, dynspread=True,\n", + " # width=800, height=500, colorbar=True\n", ")" ] }, @@ -1759,7 +1800,11 @@ "metadata": {}, "outputs": [], "source": [ - "transformer = pyproj.Transformer.from_crs(crs_from=pyproj.CRS.from_epsg(4326), crs_to=pyproj.CRS.from_epsg(3031), always_xy=True)" + "transformer = pyproj.Transformer.from_crs(\n", + " crs_from=pyproj.CRS.from_epsg(4326),\n", + " crs_to=pyproj.CRS.from_epsg(3031),\n", + " always_xy=True,\n", + ")" ] }, { @@ -1781,7 +1826,9 @@ } ], "source": [ - "dfs[\"x\"], dfs[\"y\"] = transformer.transform(xx=dfs.longitude.values, yy=dfs.latitude.values)" + "dfs[\"x\"], dfs[\"y\"] = transformer.transform(\n", + " xx=dfs.longitude.values, yy=dfs.latitude.values\n", + ")" ] }, { @@ -2054,9 +2101,12 @@ "outputs": [], "source": [ "dfs.hvplot.scatter(\n", - " x=\"x\", y=\"y\", by=\"laser\", hover_cols=[\"delta_time\", \"segment_id\", \"h_li\"],\n", - " #datashade=True, dynspread=True,\n", - " #width=800, height=500, colorbar=True\n", + " x=\"x\",\n", + " y=\"y\",\n", + " by=\"laser\",\n", + " hover_cols=[\"delta_time\", \"segment_id\", \"h_li\"],\n", + " # datashade=True, dynspread=True,\n", + " # width=800, height=500, colorbar=True\n", ")" ] }, @@ -2110,15 +2160,15 @@ "# https://github.com/ICESAT-2HackWeek/gridding/blob/master/notebook/utils.py#L23\n", "def make_grid(xmin, xmax, ymin, ymax, dx, dy):\n", " \"\"\"Construct output grid-coordinates.\"\"\"\n", - " \n", + "\n", " # Setup grid dimensions\n", " Nn = int((np.abs(ymax - ymin)) / dy) + 1\n", " Ne = int((np.abs(xmax - xmin)) / dx) + 1\n", - " \n", + "\n", " # Initiate x/y vectors for grid\n", " x_i = np.linspace(xmin, xmax, num=Ne)\n", " y_i = np.linspace(ymin, ymax, num=Nn)\n", - " \n", + "\n", " return np.meshgrid(x_i, y_i)" ] }, @@ -2128,7 +2178,9 @@ "metadata": {}, "outputs": [], "source": [ - "xi, yi = make_grid(xmin=dfs.x.min(), xmax=dfs.x.max(), ymin=dfs.y.max(), ymax=dfs.y.min(), dx=10, dy=10)" + "xi, yi = make_grid(\n", + " xmin=dfs.x.min(), xmax=dfs.x.max(), ymin=dfs.y.max(), ymax=dfs.y.min(), dx=10, dy=10\n", + ")" ] }, { @@ -2223,7 +2275,7 @@ "outputs": [], "source": [ "# https://xrviz.readthedocs.io/en/latest/set_initial_parameters.html\n", - "initial_params={\n", + "initial_params = {\n", " # Select variable to plot\n", " \"Variables\": \"h_li\",\n", " # Set coordinates\n", @@ -2231,13 +2283,13 @@ " # Axes\n", " \"x\": \"longitude\",\n", " \"y\": \"latitude\",\n", - " #\"sigma\": \"animate\",\n", + " # \"sigma\": \"animate\",\n", " # Projection\n", - " #\"is_geo\": True,\n", - " #\"basemap\": True,\n", - " #\"crs\": \"PlateCarree\"\n", + " # \"is_geo\": True,\n", + " # \"basemap\": True,\n", + " # \"crs\": \"PlateCarree\"\n", "}\n", - "dashboard = xrviz.dashboard.Dashboard(data=dataset) #, initial_params=initial_params)" + "dashboard = xrviz.dashboard.Dashboard(data=dataset) # , initial_params=initial_params)" ] }, { @@ -2288,13 +2340,16 @@ "outputs": [], "source": [ "# Paste the OpenAltimetry selection parameters here\n", - "OA_REFERENCE_URL = 'minx=-177.64275595145213&miny=-88.12014866942751&maxx=-128.25920892322736&maxy=-85.52394234080862&date=2019-05-02&trackId=515'\n", - "# We populate a list with the photon data using the OpenAltimetry API, no HDF! \n", - "OA_URL = 'https://openaltimetry.org/data/icesat2/getPhotonData?client=jupyter&' + OA_REFERENCE_URL\n", - "OA_PHOTONS = ['Noise', 'Low', 'Medium', 'High']\n", + "OA_REFERENCE_URL = \"minx=-177.64275595145213&miny=-88.12014866942751&maxx=-128.25920892322736&maxy=-85.52394234080862&date=2019-05-02&trackId=515\"\n", + "# We populate a list with the photon data using the OpenAltimetry API, no HDF!\n", + "OA_URL = (\n", + " \"https://openaltimetry.org/data/icesat2/getPhotonData?client=jupyter&\"\n", + " + OA_REFERENCE_URL\n", + ")\n", + "OA_PHOTONS = [\"Noise\", \"Low\", \"Medium\", \"High\"]\n", "# OA_PLOTTED_BEAMS = [1,2,3,4,5,6] you can select up to 6 beams for each ground track.\n", "# Some beams may not be usable due cloud covering or QC issues.\n", - "OA_BEAMS = [3,4]" + "OA_BEAMS = [3, 4]" ] }, { @@ -2304,10 +2359,19 @@ "outputs": [], "source": [ "minx, miny, maxx, maxy = [-156, -88, -127, -84]\n", - "date = \"2019-05-02\" # UTC date?\n", - "track = 515 # \n", - "beam = 1 # 1 to 6\n", - "params = {\"client\": \"jupyter\", \"minx\": minx, \"miny\": miny, \"maxx\": maxx, \"maxy\": maxy, \"date\": date, \"trackId\": str(track), \"beam\": str(beam)}" + "date = \"2019-05-02\" # UTC date?\n", + "track = 515 #\n", + "beam = 1 # 1 to 6\n", + "params = {\n", + " \"client\": \"jupyter\",\n", + " \"minx\": minx,\n", + " \"miny\": miny,\n", + " \"maxx\": maxx,\n", + " \"maxy\": maxy,\n", + " \"date\": date,\n", + " \"trackId\": str(track),\n", + " \"beam\": str(beam),\n", + "}" ] }, { @@ -2316,7 +2380,9 @@ "metadata": {}, "outputs": [], "source": [ - "r = requests.get(url=\"https://openaltimetry.org/data/icesat2/getPhotonData\", params=params)" + "r = requests.get(\n", + " url=\"https://openaltimetry.org/data/icesat2/getPhotonData\", params=params\n", + ")" ] }, { @@ -2327,11 +2393,15 @@ "source": [ "# OpenAltimetry Data cleansing\n", "df = pd.io.json.json_normalize(data=r.json()[\"series\"], meta=\"name\", record_path=\"data\")\n", - "df.name = df.name.str.split().str.get(0) # Get e.g. just \"Low\" instead of \"Low [12345]\"\n", - "df.query(expr=\"name in ('Low', 'Medium', 'High')\", inplace=True) # filter out Noise and Buffer points\n", + "df.name = df.name.str.split().str.get(0) # Get e.g. just \"Low\" instead of \"Low [12345]\"\n", + "df.query(\n", + " expr=\"name in ('Low', 'Medium', 'High')\", inplace=True\n", + ") # filter out Noise and Buffer points\n", "\n", "df.rename(columns={0: \"latitude\", 1: \"elevation\", 2: \"longitude\"}, inplace=True)\n", - "df = df.reindex(columns=[\"longitude\", \"latitude\", \"elevation\", \"name\"]) # reorder columns\n", + "df = df.reindex(\n", + " columns=[\"longitude\", \"latitude\", \"elevation\", \"name\"]\n", + ") # reorder columns\n", "df.reset_index(inplace=True)\n", "df" ] @@ -2354,6 +2424,14 @@ } ], "metadata": { + "jupytext": { + "text_representation": { + "extension": ".py", + "format_name": "hydrogen", + "format_version": "1.3", + "jupytext_version": "1.4.2" + } + }, "kernelspec": { "display_name": "deepicedrain", "language": "python", diff --git a/atl06_play.py b/atl06_play.py new file mode 100644 index 0000000..4758f9b --- /dev/null +++ b/atl06_play.py @@ -0,0 +1,576 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: hydrogen +# format_version: '1.3' +# jupytext_version: 1.4.2 +# kernelspec: +# display_name: deepicedrain +# language: python +# name: deepicedrain +# --- + +# %% [markdown] +# # **ATLAS/ICESat-2 Land Ice Height [ATL06](https://nsidc.org/data/atl06/) Exploratory Data Analysis** +# +# [Yet another](https://xkcd.com/927) take on playing with ICESat-2's Land Ice Height ATL06 data, +# specfically with a focus on analyzing ice elevation changes over Antarctica. +# Specifically, this jupyter notebook will cover: +# +# - Downloading datasets from the web via [intake](https://intake.readthedocs.io) +# - Performing [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis) +# using the [PyData](https://pydata.org) stack (e.g. [xarray](http://xarray.pydata.org), [dask](https://dask.org)) +# - Plotting figures using [Hvplot](https://hvplot.holoviz.org) and [PyGMT](https://www.pygmt.org) (TODO) +# +# This is in contrast with the [icepyx](https://github.com/icesat2py/icepyx) package +# and 'official' 2019/2020 [ICESat-2 Hackweek tutorials](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials) (which are also awesome!) +# that tends to use a slightly different approach (e.g. handcoded download scripts, [h5py](http://www.h5py.org) for data reading, etc). +# The core concept here is to run things in a more intuitive and scalable (parallelizable) manner on a continent scale (rather than just a specific region). + +# %% +import glob +import json +import logging +import netrc +import os + +import dask +import dask.distributed +import hvplot.dask +import hvplot.pandas +import hvplot.xarray +import intake +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import requests +import tqdm +import xarray as xr + +# %matplotlib inline + +# %% +# Configure intake and set number of compute cores for data download +intake.config.conf["cache_dir"] = "catdir" # saves data to current folder +intake.config.conf["download_progress"] = False # disable automatic tqdm progress bars + +logging.basicConfig(level=logging.WARNING) + +# Limit compute to 8 cores for download part using intake +# Can possibly go up to 10 because there are 10 DPs? +# See https://n5eil02u.ecs.nsidc.org/opendap/hyrax/catalog.xml +client = dask.distributed.Client(n_workers=10, threads_per_worker=1) +client + +# %% [markdown] +# ## Quick view +# +# Use our [intake catalog](https://intake.readthedocs.io/en/latest/catalog.html) to get some sample ATL06 data +# (while making sure we have our Earthdata credentials set up properly), +# and view it using [xarray](https://xarray.pydata.org) and [hvplot](https://hvplot.pyviz.org). + +# %% +catalog = intake.open_catalog( + uri="catalog.yaml" +) # open the local catalog file containing ICESAT2 stuff + +# %% +try: + netrc.netrc() +except FileNotFoundError as error_msg: + print( + f"{error_msg}, please follow instructions to create one at " + "https://nsidc.org/support/faq/what-options-are-available-bulk-downloading-data-https-earthdata-login-enabled " + 'basically using `echo "machine urs.earthdata.nasa.gov login password " >> ~/.netrc`' + ) + raise + +dataset = ( + catalog.icesat2atl06.to_dask().unify_chunks() +) # depends on .netrc file in home folder +dataset + +# %% +# dataset.hvplot.points( +# x="longitude", y="latitude", datashade=True, width=800, height=500, hover=True, +# #geo=True, coastline=True, crs=cartopy.crs.PlateCarree(), #projection=cartopy.crs.Stereographic(central_latitude=-71), +# ) +catalog.icesat2atl06.hvplot.quickview() + +# %% [markdown] +# ## Data intake +# +# Pulling in all of the raw ATL06 data (HDF5 format) from the NSIDC servers via an intake catalog file. +# Note that this will involve 100s if not 1000s of GBs of data, so make sure there's enough storage!! + +# %% +# Download all ICESAT2 ATLAS hdf files from start to end date +dates1 = pd.date_range(start="2018.10.14", end="2018.12.08") # 1st batch +dates2 = pd.date_range(start="2018.12.10", end="2019.06.26") # 2nd batch +dates3 = pd.date_range(start="2019.07.26", end="2020.03.06") # 3rd batch +dates = dates1.append(other=dates2).append(other=dates3) + +# %% +# Submit download jobs to Client +futures = [] +for date in dates: + source = catalog.icesat2atlasdownloader(date=date) + future = client.submit( + func=source.discover, key=f"download-{date}", + ) # triggers download of the file(s), or loads from cache + futures.append(future) + +# %% +# Check download progress here, https://stackoverflow.com/a/37901797/6611055 +responses = [] +for f in tqdm.tqdm( + iterable=dask.distributed.as_completed(futures=futures), total=len(futures) +): + responses.append(f.result()) + +# %% +# In case of error, check which downloads are unfinished +# Manually delete those folders and retry +unfinished = [] +for foo in futures: + if foo.status != "finished": + print(foo) + unfinished.append(foo) + # foo.retry() + +# %% +try: + assert len(unfinished) == 0 +except AssertionError: + for task in unfinished: + print(task) + raise ValueError( + f"{len(unfinished)} download tasks are unfinished," + " please delete those folders and retry again!" + ) + +# %% [raw] +# with tqdm.tqdm(total=len(dates)) as pbar: +# for date in dates: +# source = catalog.icesat2atlasdownloader(date=date) +# source_urlpath = source.urlpath +# try: +# pbar.set_postfix_str(f"Obtaining files from {source_urlpath}") +# source.discover() # triggers download of the file(s), or loads from cache +# except (requests.HTTPError, OSError, KeyError, TypeError) as error: +# # clear cache and try again +# print(f"Errored: {error}, trying again") +# source.cache[0].clear_cache(urlpath=source_urlpath) +# source.discover() +# except (ValueError, pd.core.index.InvalidIndexError) as error: +# print(f"Errored: {error}, ignoring") +# pass +# pbar.update(n=1) +# #finally: +# # source.close() +# # del source + +# %% [raw] +# catalog.icesat2atl06(date="2019.06.24", laser="gt1l").discover() # ValueError?? +# catalog.icesat2atl06(date="2019.02.28", laser="gt2l").discover() # InvalidIndexError +# catalog.icesat2atl06(date="2019.11.13", laser="gt2l").discover() # ValueError + +# %% + +# %% [markdown] +# ## Exploratory data analysis on local files +# +# Now that we've downloaded a good chunk of data and cached them locally, +# we can have some fun with visualizing the point clouds! + +# %% +dataset = ( + catalog.icesat2atl06.to_dask() +) # unfortunately, we have to load this in dask to get the path... +root_directory = os.path.dirname(os.path.dirname(dataset.encoding["source"])) + +# %% +def get_crossing_dates( + catalog_entry: intake.catalog.local.LocalCatalogEntry, + root_directory: str, + referencegroundtrack: str = "????", + datetime="*", + cyclenumber="??", + orbitalsegment="??", + version="003", + revision="01", +): + """ + Given a 4-digit reference groundtrack (e.g. 1234), + we output a dictionary where the + key is the date in "YYYY.MM.DD" format when an ICESAT2 crossing was made and the + value is the filepath to the HDF5 data file. + """ + + # Get a glob string that looks like "ATL06_??????????????_XXXX????_002_01.h5" + globpath = catalog_entry.path_as_pattern + if datetime == "*": + globpath = globpath.replace("{datetime:%Y%m%d%H%M%S}", "??????????????") + globpath = globpath.format( + referencegroundtrack=referencegroundtrack, + cyclenumber=cyclenumber, + orbitalsegment=orbitalsegment, + version=version, + revision=revision, + ) + + # Get list of filepaths (dates are contained in the filepath) + globedpaths = glob.glob(os.path.join(root_directory, "??????????", globpath)) + + # Pick out just the dates in "YYYY.MM.DD" format from the globedpaths + # crossingdates = [os.path.basename(os.path.dirname(p=p)) for p in globedpaths] + crossingdates = { + os.path.basename(os.path.dirname(p=p)): p for p in sorted(globedpaths) + } + + return crossingdates + + +# %% +crossing_dates_dict = {} +for rgt in range(0, 1388): # ReferenceGroundTrack goes from 0001 to 1387 + referencegroundtrack = f"{rgt}".zfill(4) + crossing_dates = dask.delayed(get_crossing_dates)( + catalog_entry=catalog.icesat2atl06, + root_directory=root_directory, + referencegroundtrack=referencegroundtrack, + ) + crossing_dates_dict[referencegroundtrack] = crossing_dates +crossing_dates_dict = dask.compute(crossing_dates_dict)[0] + +# %% +crossing_dates_dict["0349"].keys() + + +# %% [markdown] +# ![ICESat-2 Laser Beam Pattern](https://ars.els-cdn.com/content/image/1-s2.0-S0034425719303712-gr1.jpg) + +# %% [raw] +# # For one laser along one reference ground track, +# # concatenate all points from all dates into one xr.Dataset +# da = xr.concat( +# objs=( +# catalog.icesat2atl06(date=date, laser="gt1r") +# .to_dask() +# .sel(referencegroundtrack=referencegroundtrack) +# for date in crossing_dates +# ), +# dim=pd.Index(data=crossing_dates, name="crossingdates"), +# ) + +# %% +def six_laser_beams(crossing_dates: list): + """ + For all 6 lasers along one reference ground track, + concatenate all points from all crossing dates into one xr.Dataset + """ + lasers = ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"] + + objs = [ + xr.open_mfdataset( + paths=crossing_dates.values(), + combine="nested", + engine="h5netcdf", + concat_dim="delta_time", + group=f"{laser}/land_ice_segments", + parallel=True, + ).assign_coords(coords={"laser": laser}) + for laser in lasers + ] + + try: + da = xr.concat( + objs=objs, dim="laser" + ) # dim=pd.Index(data=lasers, name="laser") + df = da.unify_chunks().to_dask_dataframe() + except ValueError: + # ValueError: cannot reindex or align along dimension 'delta_time' because the index has duplicate values + df = dask.dataframe.concat( + [obj.unify_chunks().to_dask_dataframe() for obj in objs] + ) + + return df + + +# %% +dataset_dict = {} +# for referencegroundtrack in list(crossing_dates_dict)[349:350]: # ReferenceGroundTrack goes from 0001 to 1387 +for referencegroundtrack in list(crossing_dates_dict)[ + 340:350 +]: # ReferenceGroundTrack goes from 0001 to 1387 + # print(referencegroundtrack) + if len(crossing_dates_dict[referencegroundtrack]) > 0: + da = dask.delayed(six_laser_beams)( + crossing_dates=crossing_dates_dict[referencegroundtrack] + ) + # da = six_laser_beams(crossing_dates=crossing_dates_dict[referencegroundtrack]) + dataset_dict[referencegroundtrack] = da + +# %% +df = dataset_dict["0349"].compute() # loads into a dask dataframe (lazy) + +# %% +df + +# %% + +# %% +dataset_dict = dask.compute(dataset_dict)[ + 0 +] # compute every referencegroundtrack, slow... though somewhat parallelized + +# %% +bdf = dask.dataframe.concat(dfs=list(dataset_dict.values())) + +# %% + +# %% +da.sel(crossingdates="2018.10.21").h_li.unify_chunks().drop( + labels=["longitude", "datetime", "cyclenumber"] +).hvplot( + kind="scatter", + x="latitude", + by="crossingdates", + datashade=True, + dynspread=True, + width=800, + height=500, + dynamic=True, + flip_xaxis=True, + hover=True, +) + +# %% + +# %% [raw] +# # https://xarray.pydata.org/en/stable/combining.html#concatenate +# # For all 6 lasers one one date ~~along one reference ground track~~, +# # concatenate all points ~~from one dates~~ into one xr.Dataset +# lasers = ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"] +# da = xr.concat( +# objs=( +# catalog.icesat2atl06(laser=laser) +# .to_dask() +# #.sel(referencegroundtrack=referencegroundtrack) +# for laser in lasers +# ), +# dim=pd.Index(data=lasers, name="laser") +# ) + +# %% + +# %% [markdown] +# ## Plot them points! + +# %% +# convert dask.dataframe to pd.DataFrame +df = df.compute() + +# %% +df = df.dropna(subset=["h_li"]).query(expr="atl06_quality_summary == 0").reset_index() + +# %% +dfs = df.query(expr="0 <= segment_id - 1443620 < 900") +dfs + +# %% +dfs.hvplot.scatter( + x="longitude", + y="latitude", + by="laser", + hover_cols=["delta_time", "segment_id"], + # datashade=True, dynspread=True, + # width=800, height=500, colorbar=True +) + +# %% +import pyproj + +# %% +transformer = pyproj.Transformer.from_crs( + crs_from=pyproj.CRS.from_epsg(4326), + crs_to=pyproj.CRS.from_epsg(3031), + always_xy=True, +) + +# %% +dfs["x"], dfs["y"] = transformer.transform( + xx=dfs.longitude.values, yy=dfs.latitude.values +) + +# %% +dfs + +# %% +dfs.hvplot.scatter( + x="x", + y="y", + by="laser", + hover_cols=["delta_time", "segment_id", "h_li"], + # datashade=True, dynspread=True, + # width=800, height=500, colorbar=True +) + +# %% +dfs.hvplot.scatter(x="x", y="h_li", by="laser") + +# %% +dfs.to_pickle(path="icesat2_sample.pkl") + +# %% + +# %% [markdown] +# ## Old making a DEM grid surface from points + +# %% +import scipy + + +# %% +# https://github.com/ICESAT-2HackWeek/gridding/blob/master/notebook/utils.py#L23 +def make_grid(xmin, xmax, ymin, ymax, dx, dy): + """Construct output grid-coordinates.""" + + # Setup grid dimensions + Nn = int((np.abs(ymax - ymin)) / dy) + 1 + Ne = int((np.abs(xmax - xmin)) / dx) + 1 + + # Initiate x/y vectors for grid + x_i = np.linspace(xmin, xmax, num=Ne) + y_i = np.linspace(ymin, ymax, num=Nn) + + return np.meshgrid(x_i, y_i) + + +# %% +xi, yi = make_grid( + xmin=dfs.x.min(), xmax=dfs.x.max(), ymin=dfs.y.max(), ymax=dfs.y.min(), dx=10, dy=10 +) + +# %% +ar = scipy.interpolate.griddata(points=(dfs.x, dfs.y), values=dfs.h_li, xi=(xi, yi)) + +# %% +plt.imshow(ar, extent=(dfs.x.min(), dfs.x.max(), dfs.y.min(), dfs.y.max())) + +# %% + +# %% +import plotly.express as px + +# %% +px.scatter_3d(data_frame=dfs, x="longitude", y="latitude", z="h_li", color="laser") + +# %% + +# %% [markdown] +# ### Play using XrViz +# +# Install the PyViz JupyterLab extension first using the [extension manager](https://jupyterlab.readthedocs.io/en/stable/user/extensions.html#using-the-extension-manager) or via the command below: +# +# ```bash +# jupyter labextension install @pyviz/jupyterlab_pyviz@v0.8.0 --no-build +# jupyter labextension list # check to see that extension is installed +# jupyter lab build --debug # build extension ??? with debug messages printed +# ``` +# +# Note: Had to add `network-timeout 600000` to `.yarnrc` file to resolve university network issues. + +# %% +import xrviz + +# %% +xrviz.example() + +# %% +# https://xrviz.readthedocs.io/en/latest/set_initial_parameters.html +initial_params = { + # Select variable to plot + "Variables": "h_li", + # Set coordinates + "Set Coords": ["longitude", "latitude"], + # Axes + "x": "longitude", + "y": "latitude", + # "sigma": "animate", + # Projection + # "is_geo": True, + # "basemap": True, + # "crs": "PlateCarree" +} +dashboard = xrviz.dashboard.Dashboard(data=dataset) # , initial_params=initial_params) + +# %% +dashboard.panel + +# %% +dashboard.show() + +# %% + +# %% [markdown] +# ## OpenAltimetry + +# %% +"minx=-154.56678505984297&miny=-88.82881451427136&maxx=-125.17872921546498&maxy=-81.34051361301398&date=2019-05-02&trackId=516" + +# %% +# Paste the OpenAltimetry selection parameters here +OA_REFERENCE_URL = "minx=-177.64275595145213&miny=-88.12014866942751&maxx=-128.25920892322736&maxy=-85.52394234080862&date=2019-05-02&trackId=515" +# We populate a list with the photon data using the OpenAltimetry API, no HDF! +OA_URL = ( + "https://openaltimetry.org/data/icesat2/getPhotonData?client=jupyter&" + + OA_REFERENCE_URL +) +OA_PHOTONS = ["Noise", "Low", "Medium", "High"] +# OA_PLOTTED_BEAMS = [1,2,3,4,5,6] you can select up to 6 beams for each ground track. +# Some beams may not be usable due cloud covering or QC issues. +OA_BEAMS = [3, 4] + +# %% +minx, miny, maxx, maxy = [-156, -88, -127, -84] +date = "2019-05-02" # UTC date? +track = 515 # +beam = 1 # 1 to 6 +params = { + "client": "jupyter", + "minx": minx, + "miny": miny, + "maxx": maxx, + "maxy": maxy, + "date": date, + "trackId": str(track), + "beam": str(beam), +} + +# %% +r = requests.get( + url="https://openaltimetry.org/data/icesat2/getPhotonData", params=params +) + +# %% +# OpenAltimetry Data cleansing +df = pd.io.json.json_normalize(data=r.json()["series"], meta="name", record_path="data") +df.name = df.name.str.split().str.get(0) # Get e.g. just "Low" instead of "Low [12345]" +df.query( + expr="name in ('Low', 'Medium', 'High')", inplace=True +) # filter out Noise and Buffer points + +df.rename(columns={0: "latitude", 1: "elevation", 2: "longitude"}, inplace=True) +df = df.reindex( + columns=["longitude", "latitude", "elevation", "name"] +) # reorder columns +df.reset_index(inplace=True) +df + +# %% +df.hvplot.scatter(x="latitude", y="elevation") + +# %% From 3a4b5ace0b10d823c7d8df224ae55a01b756ac97 Mon Sep 17 00:00:00 2001 From: Wei Ji Date: Wed, 20 May 2020 15:18:31 +1200 Subject: [PATCH 5/5] :zap: Streamline ATL06 data loading and analysis Tidy up lots of things left over from early experimentation. Now loading xarray.Dataset way faster by combining using "by_coords" instead of "nested". This change has been implemented in both the catalog.yaml file and six_laser_beams function (the latter which needs a refactor). The catalog.yaml now features a way to load only 1 reference ground track (instead of multiple), if only you know the date too! Made it easier to understand what some of the variables/functions are by using type hints. The quickview plot now plots the coastline of Antarctica too! --- atl06_play.ipynb | 1367 ++++++++++++++++++++-------------------------- atl06_play.py | 203 +++---- catalog.yaml | 21 +- 3 files changed, 693 insertions(+), 898 deletions(-) diff --git a/atl06_play.ipynb b/atl06_play.ipynb index d73d68a..207abc9 100644 --- a/atl06_play.ipynb +++ b/atl06_play.ipynb @@ -33,6 +33,7 @@ "import netrc\n", "import os\n", "\n", + "import cartopy\n", "import dask\n", "import dask.distributed\n", "import hvplot.dask\n", @@ -42,6 +43,7 @@ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", + "import pyproj\n", "import requests\n", "import tqdm\n", "import xarray as xr\n", @@ -62,7 +64,7 @@ "
        \n", "

        Client

        \n", "\n", "
        " ], "text/plain": [ - "" + "" ] }, "execution_count": 2, @@ -88,12 +90,11 @@ ], "source": [ "# Configure intake and set number of compute cores for data download\n", - "intake.config.conf[\"cache_dir\"] = \"catdir\" # saves data to current folder\n", "intake.config.conf[\"download_progress\"] = False # disable automatic tqdm progress bars\n", "\n", "logging.basicConfig(level=logging.WARNING)\n", "\n", - "# Limit compute to 8 cores for download part using intake\n", + "# Limit compute to 10 cores for download part using intake\n", "# Can possibly go up to 10 because there are 10 DPs?\n", "# See https://n5eil02u.ecs.nsidc.org/opendap/hyrax/catalog.xml\n", "client = dask.distributed.Client(n_workers=10, threads_per_worker=1)\n", @@ -117,9 +118,8 @@ "metadata": {}, "outputs": [], "source": [ - "catalog = intake.open_catalog(\n", - " uri=\"catalog.yaml\"\n", - ") # open the local catalog file containing ICESAT2 stuff" + "# open the local catalog file containing ICESat-2 stuff\n", + "catalog = intake.open_catalog(uri=\"catalog.yaml\")" ] }, { @@ -464,7 +464,10 @@ " stroke: currentColor;\n", " fill: currentColor;\n", "}\n", - "
        xarray.Dataset
          • delta_time: 907899
          • referencegroundtrack: 10
          • cyclenumber
            ()
            <U2
            '03'
            array('03', dtype='<U2')
          • longitude
            (delta_time)
            float64
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            physicalMeasurement
            description :
            Longitude of segment center, , WGS84, East=+
            long_name :
            Longitude
            source :
            section 3.10
            standard_name :
            longitude
            units :
            degrees_east
            valid_max :
            180.0
            valid_min :
            -180.0
            \n",
            +       "
            xarray.Dataset
              • delta_time: 1683571
              • revision
                ()
                <U2
                '01'
                array('01', dtype='<U2')
              • orbitalsegment
                ()
                <U2
                '11'
                array('11', dtype='<U2')
              • cyclenumber
                ()
                <U2
                '06'
                array('06', dtype='<U2')
              • version
                ()
                <U3
                '003'
                array('003', dtype='<U3')
              • delta_time
                (delta_time)
                datetime64[ns]
                2020-03-06T00:25:18.782277720 .....
                contentType :
                referenceInformation
                description :
                Number of GPS seconds since the ATLAS SDP epoch. The ATLAS Standard Data Products (SDP) epoch offset is defined within /ancillary_data/atlas_sdp_gps_epoch as the number of GPS seconds between the GPS epoch (1980-01-06T00:00:00.000000Z UTC) and the ATLAS SDP epoch. By adding the offset contained within atlas_sdp_gps_epoch to delta time parameters, the time in gps_seconds relative to the GPS epoch can be computed.
                long_name :
                Elapsed GPS seconds
                source :
                section 4.4
                standard_name :
                time
                array(['2020-03-06T00:25:18.782277720', '2020-03-06T00:25:18.785109384',\n",
                +       "       '2020-03-06T00:25:18.787941512', ..., '2020-03-07T00:05:23.660334576',\n",
                +       "       '2020-03-07T00:05:23.663158032', '2020-03-07T00:05:23.665982160'],\n",
                +       "      dtype='datetime64[ns]')
              • latitude
                (delta_time)
                float64
                dask.array<chunksize=(50000,), meta=np.ndarray>
                contentType :
                physicalMeasurement
                description :
                Latitude of segment center, WGS84, North=+,
                long_name :
                Latitude
                source :
                section 3.10
                standard_name :
                latitude
                units :
                degrees_north
                valid_max :
                90.0
                valid_min :
                -90.0
            \n", "\n", "\n", "\n", - "
            \n", "\n", @@ -472,9 +475,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 7.26 MB 622.86 kB
            Shape (907899,) (77857,)
            Count 701 Tasks 15 Chunks
            Bytes 13.47 MB 400.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float64 numpy.ndarray
            \n", @@ -488,32 +491,60 @@ "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", + " 1683571\n", " 1\n", "\n", "
          • revision
            ()
            <U2
            '01'
            array('01', dtype='<U2')
          • version
            ()
            <U3
            '003'
            array('003', dtype='<U3')
          • orbitalsegment
            ()
            <U2
            '11'
            array('11', dtype='<U2')
          • latitude
            (delta_time)
            float64
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            physicalMeasurement
            description :
            Latitude of segment center, WGS84, North=+,
            long_name :
            Latitude
            source :
            section 3.10
            standard_name :
            latitude
            units :
            degrees_north
            valid_max :
            90.0
            valid_min :
            -90.0
            \n",
            +       "
          • longitude
            (delta_time)
            float64
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            physicalMeasurement
            description :
            Longitude of segment center, , WGS84, East=+
            long_name :
            Longitude
            source :
            section 3.10
            standard_name :
            longitude
            units :
            degrees_east
            valid_max :
            180.0
            valid_min :
            -180.0
            \n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -521,9 +552,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 7.26 MB 622.86 kB
            Shape (907899,) (77857,)
            Count 701 Tasks 15 Chunks
            Bytes 13.47 MB 400.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float64 numpy.ndarray
            \n", @@ -537,41 +568,63 @@ "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", + " 1683571\n", " 1\n", "\n", "
          • delta_time
            (delta_time)
            datetime64[ns]
            2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984
            contentType :
            referenceInformation
            description :
            Number of GPS seconds since the ATLAS SDP epoch. The ATLAS Standard Data Products (SDP) epoch offset is defined within /ancillary_data/atlas_sdp_gps_epoch as the number of GPS seconds between the GPS epoch (1980-01-06T00:00:00.000000Z UTC) and the ATLAS SDP epoch. By adding the offset contained within atlas_sdp_gps_epoch to delta time parameters, the time in gps_seconds relative to the GPS epoch can be computed.
            long_name :
            Elapsed GPS seconds
            source :
            section 4.4
            standard_name :
            time
            array(['2019-06-26T00:35:41.688893288', '2019-06-26T00:35:41.936968728',\n",
            -       "       '2019-06-26T00:36:25.364432688', ..., '2019-06-26T14:49:54.854559880',\n",
            -       "       '2019-06-26T14:49:54.975701040', '2019-06-26T14:49:55.015304984'],\n",
            -       "      dtype='datetime64[ns]')
          • datetime
            (referencegroundtrack)
            datetime64[ns]
            2019-06-26T00:35:36 ... 2019-06-26T14:44:13
            array(['2019-06-26T00:35:36.000000000', '2019-06-26T02:09:54.000000000',\n",
            -       "       '2019-06-26T03:44:11.000000000', '2019-06-26T05:18:28.000000000',\n",
            -       "       '2019-06-26T06:52:45.000000000', '2019-06-26T08:27:03.000000000',\n",
            -       "       '2019-06-26T10:01:20.000000000', '2019-06-26T11:35:38.000000000',\n",
            -       "       '2019-06-26T13:09:55.000000000', '2019-06-26T14:44:13.000000000'],\n",
            -       "      dtype='datetime64[ns]')
          • referencegroundtrack
            (referencegroundtrack)
            object
            '1355' '1356' ... '1363' '1364'
            array(['1355', '1356', '1357', '1358', '1359', '1360', '1361', '1362', '1363',\n",
            -       "       '1364'], dtype=object)
          • atl06_quality_summary
            (referencegroundtrack, delta_time)
            float64
            dask.array<chunksize=(1, 50000), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            The ATL06_quality_summary parameter indicates the best-quality subset of all ATL06 data. A zero in this parameter implies that no data-quality tests have found a problem with the segment, a one implies that some potential problem has been found. Users who select only segments with zero values for this flag can be relatively certain of obtaining high-quality data, but will likely miss a significant fraction of usable data, particularly in cloudy, rough, or low-surface-reflectance conditions.
            flag_meanings :
            best_quality potential_problem
            flag_values :
            [0 1]
            long_name :
            ATL06_Quality_Summary
            source :
            section 4.3
            units :
            1
            valid_max :
            1
            valid_min :
            0
            \n",
            +       "
          • datetime
            (delta_time)
            datetime64[ns]
            2020-03-06T00:25:18 ... 2020-03-...
            array(['2020-03-06T00:25:18.000000000', '2020-03-06T00:25:18.000000000',\n",
            +       "       '2020-03-06T00:25:18.000000000', ...,\n",
            +       "       '2020-03-06T23:59:40.000000000', '2020-03-06T23:59:40.000000000',\n",
            +       "       '2020-03-06T23:59:40.000000000'], dtype='datetime64[ns]')
          • referencegroundtrack
            (delta_time)
            <U4
            '1073' '1073' ... '1088' '1088'
            array(['1073', '1073', '1073', ..., '1088', '1088', '1088'], dtype='<U4')
          • atl06_quality_summary
            (delta_time)
            int8
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            The ATL06_quality_summary parameter indicates the best-quality subset of all ATL06 data. A zero in this parameter implies that no data-quality tests have found a problem with the segment, a one implies that some potential problem has been found. Users who select only segments with zero values for this flag can be relatively certain of obtaining high-quality data, but will likely miss a significant fraction of usable data, particularly in cloudy, rough, or low-surface-reflectance conditions.
            flag_meanings :
            best_quality potential_problem
            flag_values :
            [0 1]
            long_name :
            ATL06_Quality_Summary
            source :
            section 4.3
            units :
            1
            valid_max :
            1
            valid_min :
            0
            \n",
                    "\n",
                    "\n",
            @@ -591,45 +644,64 @@
                    "\n",
                    "  \n",
                    "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
                    "  \n",
                    "\n",
                    "  \n",
                    "  \n",
            -       "  \n",
            -       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
                    "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
                    "  \n",
            -       "  \n",
            -       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
                    "  \n",
            -       "  \n",
            -       "  \n",
            -       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
            +       "  \n",
                    "  \n",
                    "\n",
                    "  \n",
                    "  \n",
                    "\n",
                    "  \n",
            -       "  907899\n",
            -       "  10\n",
            +       "  1683571\n",
            +       "  1\n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -579,10 +632,10 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", "
            Array Chunk
            Bytes 72.63 MB 622.86 kB
            Shape (10, 907899) (1, 77857)
            Count 656 Tasks 150 Chunks
            Type float64 numpy.ndarray
            Bytes 1.68 MB 50.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type int8 numpy.ndarray
            \n", "
          • h_li
            (referencegroundtrack, delta_time)
            float32
            dask.array<chunksize=(1, 50000), meta=np.ndarray>
            contentType :
            physicalMeasurement
            description :
            Standard land-ice segment height determined by land ice algorithm, corrected for first-photon bias, representing the median- based height of the selected PEs
            long_name :
            Land Ice height
            source :
            section 4.4
            units :
            meters
            \n",
            +       "
          • h_li
            (delta_time)
            float32
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            physicalMeasurement
            description :
            Standard land-ice segment height determined by land ice algorithm, corrected for first-photon bias, representing the median- based height of the selected PEs
            long_name :
            Land Ice height
            source :
            section 4.4
            units :
            meters
            \n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -637,9 +709,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 36.32 MB 311.43 kB
            Shape (10, 907899) (1, 77857)
            Count 632 Tasks 150 Chunks
            Bytes 6.73 MB 200.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float32 numpy.ndarray
            \n", @@ -649,45 +721,64 @@ "\n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", - " 10\n", + " 1683571\n", + " 1\n", "\n", "
          • h_li_sigma
            (referencegroundtrack, delta_time)
            float32
            dask.array<chunksize=(1, 50000), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            Propagated error due to sampling error and FPB correction from the land ice algorithm
            long_name :
            Expected RMS segment misfit
            source :
            section 4.4
            units :
            meters
            \n",
            +       "
          • h_li_sigma
            (delta_time)
            float32
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            Propagated error due to sampling error and FPB correction from the land ice algorithm
            long_name :
            Expected RMS segment misfit
            source :
            section 4.4
            units :
            meters
            \n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -695,9 +786,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 36.32 MB 311.43 kB
            Shape (10, 907899) (1, 77857)
            Count 632 Tasks 150 Chunks
            Bytes 6.73 MB 200.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float32 numpy.ndarray
            \n", @@ -707,45 +798,64 @@ "\n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", - " 10\n", + " 1683571\n", + " 1\n", "\n", "
          • segment_id
            (referencegroundtrack, delta_time)
            float64
            dask.array<chunksize=(1, 50000), meta=np.ndarray>
            contentType :
            referenceInformation
            description :
            Segment number, counting from the equator. Equal to the segment_id for the second of the two 20m ATL03 segments included in the 40m ATL06 segment
            long_name :
            Reference Point, m
            source :
            section 3.1.2.1
            units :
            1
            \n",
            +       "
          • segment_id
            (delta_time)
            float64
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            referenceInformation
            description :
            Segment number, counting from the equator. Equal to the segment_id for the second of the two 20m ATL03 segments included in the 40m ATL06 segment
            long_name :
            Reference Point, m
            source :
            section 3.1.2.1
            units :
            1
            \n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -753,9 +863,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 72.63 MB 622.86 kB
            Shape (10, 907899) (1, 77857)
            Count 632 Tasks 150 Chunks
            Bytes 13.47 MB 400.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float64 numpy.ndarray
            \n", @@ -765,45 +875,64 @@ "\n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", - " 10\n", + " 1683571\n", + " 1\n", "\n", "
          • sigma_geo_h
            (referencegroundtrack, delta_time)
            float32
            dask.array<chunksize=(1, 50000), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            Total vertical geolocation error due to PPD and POD, including the effects of horizontal geolocation error on the segment vertical error.
            long_name :
            Vertical Geolocation Error
            source :
            ATBD Section 3.10
            units :
            meters
            \n",
            +       "
          • sigma_geo_h
            (delta_time)
            float32
            dask.array<chunksize=(50000,), meta=np.ndarray>
            contentType :
            qualityInformation
            description :
            Total vertical geolocation error due to PPD and POD, including the effects of horizontal geolocation error on the segment vertical error.
            long_name :
            Vertical Geolocation Error
            source :
            ATBD Section 3.10
            units :
            meters
            \n",
                    "\n",
                    "\n",
                    "\n",
            -       "
            \n", "\n", @@ -811,9 +940,9 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", " \n", " \n", "
            Array Chunk
            Bytes 36.32 MB 311.43 kB
            Shape (10, 907899) (1, 77857)
            Count 632 Tasks 150 Chunks
            Bytes 6.73 MB 200.00 kB
            Shape (1683571,) (50000,)
            Count 102 Tasks 43 Chunks
            Type float32 numpy.ndarray
            \n", @@ -823,65 +952,84 @@ "\n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", "\n", " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", "\n", " \n", " \n", "\n", " \n", - " 907899\n", - " 10\n", + " 1683571\n", + " 1\n", "\n", "
        • Description :
          The land_ice_height group contains the primary set of derived ATL06 products. This includes geolocation, height, and standard error and quality measures for each segment. This group is sparse, meaning that parameters are provided only for pairs of segments for which at least one beam has a valid surface-height measurement.
          data_rate :
          Data within this group are sparse. Data values are provided only for those ICESat-2 20m segments where at least one beam has a valid land ice height measurement.
        " + "
      • Description :
        The land_ice_height group contains the primary set of derived ATL06 products. This includes geolocation, height, and standard error and quality measures for each segment. This group is sparse, meaning that parameters are provided only for pairs of segments for which at least one beam has a valid surface-height measurement.
        data_rate :
        Data within this group are sparse. Data values are provided only for those ICESat-2 20m segments where at least one beam has a valid land ice height measurement.
      • " ], "text/plain": [ "\n", - "Dimensions: (delta_time: 907899, referencegroundtrack: 10)\n", + "Dimensions: (delta_time: 1683571)\n", "Coordinates:\n", - " cyclenumber \n", " revision \n", - " * delta_time (delta_time) datetime64[ns] 2019-06-26T00:35:41.688893288 ... 2019-06-26T14:49:55.015304984\n", - " datetime (referencegroundtrack) datetime64[ns] 2019-06-26T00:35:36 ... 2019-06-26T14:44:13\n", - " * referencegroundtrack (referencegroundtrack) object '1355' ... '1364'\n", + " longitude (delta_time) float64 dask.array\n", + " datetime (delta_time) datetime64[ns] 2020-03-06T00:25:18 .....\n", + " referencegroundtrack (delta_time) \n", - " h_li (referencegroundtrack, delta_time) float32 dask.array\n", - " h_li_sigma (referencegroundtrack, delta_time) float32 dask.array\n", - " segment_id (referencegroundtrack, delta_time) float64 dask.array\n", - " sigma_geo_h (referencegroundtrack, delta_time) float32 dask.array\n", + " atl06_quality_summary (delta_time) int8 dask.array\n", + " h_li (delta_time) float32 dask.array\n", + " h_li_sigma (delta_time) float32 dask.array\n", + " segment_id (delta_time) float64 dask.array\n", + " sigma_geo_h (delta_time) float32 dask.array\n", "Attributes:\n", " Description: The land_ice_height group contains the primary set of deriv...\n", " data_rate: Data within this group are sparse. Data values are provide..." @@ -903,9 +1051,8 @@ " )\n", " raise\n", "\n", - "dataset = (\n", - " catalog.icesat2atl06.to_dask().unify_chunks()\n", - ") # depends on .netrc file in home folder\n", + "# depends on .netrc file in home folder\n", + "dataset = catalog.icesat2atl06.to_dask().unify_chunks()\n", "dataset" ] }, @@ -916,8 +1063,18 @@ "outputs": [], "source": [ "# dataset.hvplot.points(\n", - "# x=\"longitude\", y=\"latitude\", datashade=True, width=800, height=500, hover=True,\n", - "# #geo=True, coastline=True, crs=cartopy.crs.PlateCarree(), #projection=cartopy.crs.Stereographic(central_latitude=-71),\n", + "# x=\"longitude\",\n", + "# y=\"latitude\",\n", + "# c=\"h_li\",\n", + "# cmap=\"Blues\",\n", + "# rasterize=True,\n", + "# hover=True,\n", + "# width=800,\n", + "# height=500,\n", + "# geo=True,\n", + "# coastline=True,\n", + "# crs=cartopy.crs.PlateCarree(),\n", + "# projection=cartopy.crs.Stereographic(central_latitude=-71),\n", "# )\n", "catalog.icesat2atl06.hvplot.quickview()" ] @@ -963,34 +1120,9 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - " 88%|████████▊ | 421/481 [1:46:28<15:10, 15.17s/it] \n" - ] - }, - { - "ename": "OSError", - "evalue": "no files to open", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mresponses\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtqdm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtqdm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdistributed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_completed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtotal\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mresponses\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/distributed/client.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 214\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"error\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 215\u001b[0m \u001b[0mtyp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 216\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 217\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"cancelled\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/intake/source/base.py\u001b[0m in \u001b[0;36mdiscover\u001b[0;34m()\u001b[0m\n\u001b[1;32m 167\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdiscover\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 168\u001b[0m \u001b[0;34m\"\"\"Open resource and populate the source attributes.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 169\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_load_metadata\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 170\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 171\u001b[0m return dict(datashape=self.datashape,\n", - "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/intake/source/base.py\u001b[0m in \u001b[0;36m_load_metadata\u001b[0;34m()\u001b[0m\n\u001b[1;32m 115\u001b[0m \u001b[0;34m\"\"\"load metadata only if needed\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 117\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_schema\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 118\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatashape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatashape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 119\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_schema\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/home/leongwei1/miniconda3/envs/deepicedrain/src/intake-xarray/intake_xarray/base.py\u001b[0m in \u001b[0;36m_get_schema\u001b[0;34m()\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ds\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_open_dataset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m metadata = {\n", - "\u001b[0;32m/home/leongwei1/miniconda3/envs/deepicedrain/src/intake-xarray/intake_xarray/netcdf.py\u001b[0m in \u001b[0;36m_open_dataset\u001b[0;34m()\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0murl\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfsspec\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen_local\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstorage_options\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 81\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ds\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_open_dataset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mchunks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchunks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 82\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_add_path_to_ds\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/miniconda3/envs/deepicedrain/lib/python3.8/site-packages/xarray/backends/api.py\u001b[0m in \u001b[0;36mopen_mfdataset\u001b[0;34m()\u001b[0m\n\u001b[1;32m 876\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 877\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mpaths\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 878\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mOSError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"no files to open\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 879\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 880\u001b[0m \u001b[0;31m# If combine='by_coords' then this is unnecessary, but quick.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mOSError\u001b[0m: no files to open" - ] - } - ], + "outputs": [], "source": [ "# Check download progress here, https://stackoverflow.com/a/37901797/6611055\n", "responses = []\n", @@ -1033,40 +1165,6 @@ " )" ] }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "with tqdm.tqdm(total=len(dates)) as pbar:\n", - " for date in dates:\n", - " source = catalog.icesat2atlasdownloader(date=date)\n", - " source_urlpath = source.urlpath\n", - " try:\n", - " pbar.set_postfix_str(f\"Obtaining files from {source_urlpath}\")\n", - " source.discover() # triggers download of the file(s), or loads from cache\n", - " except (requests.HTTPError, OSError, KeyError, TypeError) as error:\n", - " # clear cache and try again\n", - " print(f\"Errored: {error}, trying again\")\n", - " source.cache[0].clear_cache(urlpath=source_urlpath)\n", - " source.discover()\n", - " except (ValueError, pd.core.index.InvalidIndexError) as error:\n", - " print(f\"Errored: {error}, ignoring\")\n", - " pass\n", - " pbar.update(n=1)\n", - " #finally:\n", - " # source.close()\n", - " # del source" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "catalog.icesat2atl06(date=\"2019.06.24\", laser=\"gt1l\").discover() # ValueError??\n", - "catalog.icesat2atl06(date=\"2019.02.28\", laser=\"gt2l\").discover() # InvalidIndexError\n", - "catalog.icesat2atl06(date=\"2019.11.13\", laser=\"gt2l\").discover() # ValueError" - ] - }, { "cell_type": "code", "execution_count": null, @@ -1086,21 +1184,20 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ - "dataset = (\n", - " catalog.icesat2atl06.to_dask()\n", - ") # unfortunately, we have to load this in dask to get the path...\n", - "root_directory = os.path.dirname(os.path.dirname(dataset.encoding[\"source\"]))" + "root_directory = os.path.dirname(\n", + " catalog.icesat2atl06.storage_options[\"simplecache\"][\"cache_storage\"]\n", + ")" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 7, "metadata": { "lines_to_next_cell": 2 }, @@ -1110,12 +1207,12 @@ " catalog_entry: intake.catalog.local.LocalCatalogEntry,\n", " root_directory: str,\n", " referencegroundtrack: str = \"????\",\n", - " datetime=\"*\",\n", - " cyclenumber=\"??\",\n", - " orbitalsegment=\"??\",\n", - " version=\"002\",\n", - " revision=\"01\",\n", - "):\n", + " datetimestr: str = \"*\",\n", + " cyclenumber: str = \"??\",\n", + " orbitalsegment: str = \"??\",\n", + " version: str = \"003\",\n", + " revision: str = \"01\",\n", + ") -> dict:\n", " \"\"\"\n", " Given a 4-digit reference groundtrack (e.g. 1234),\n", " we output a dictionary where the\n", @@ -1124,10 +1221,10 @@ " \"\"\"\n", "\n", " # Get a glob string that looks like \"ATL06_??????????????_XXXX????_002_01.h5\"\n", - " globpath = catalog_entry.path_as_pattern\n", - " if datetime == \"*\":\n", - " globpath = globpath.replace(\"{datetime:%Y%m%d%H%M%S}\", \"??????????????\")\n", - " globpath = globpath.format(\n", + " globpath: str = catalog_entry.path_as_pattern\n", + " if datetimestr == \"*\":\n", + " globpath: str = globpath.replace(\"{datetime:%Y%m%d%H%M%S}\", \"??????????????\")\n", + " globpath: str = globpath.format(\n", " referencegroundtrack=referencegroundtrack,\n", " cyclenumber=cyclenumber,\n", " orbitalsegment=orbitalsegment,\n", @@ -1136,11 +1233,11 @@ " )\n", "\n", " # Get list of filepaths (dates are contained in the filepath)\n", - " globedpaths = glob.glob(os.path.join(root_directory, \"??????????\", globpath))\n", + " globedpaths: list = glob.glob(os.path.join(root_directory, \"??????????\", globpath))\n", "\n", " # Pick out just the dates in \"YYYY.MM.DD\" format from the globedpaths\n", " # crossingdates = [os.path.basename(os.path.dirname(p=p)) for p in globedpaths]\n", - " crossingdates = {\n", + " crossingdates: dict = {\n", " os.path.basename(os.path.dirname(p=p)): p for p in sorted(globedpaths)\n", " }\n", "\n", @@ -1149,14 +1246,14 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "crossing_dates_dict = {}\n", - "for rgt in range(0, 1388): # ReferenceGroundTrack goes from 0001 to 1387\n", - " referencegroundtrack = f\"{rgt}\".zfill(4)\n", - " crossing_dates = dask.delayed(get_crossing_dates)(\n", + "for rgt in range(1, 1388): # ReferenceGroundTrack goes from 0001 to 1387\n", + " referencegroundtrack: str = f\"{rgt}\".zfill(4)\n", + " crossing_dates: dict = dask.delayed(get_crossing_dates)(\n", " catalog_entry=catalog.icesat2atl06,\n", " root_directory=root_directory,\n", " referencegroundtrack=referencegroundtrack,\n", @@ -1167,16 +1264,16 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "dict_keys(['2018.10.21', '2019.01.20', '2019.04.21', '2019.10.19'])" + "dict_keys(['2018.10.21', '2019.01.20', '2019.04.21', '2019.10.19', '2020.01.18'])" ] }, - "execution_count": 10, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -1192,42 +1289,27 @@ "![ICESat-2 Laser Beam Pattern](https://ars.els-cdn.com/content/image/1-s2.0-S0034425719303712-gr1.jpg)" ] }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "# For one laser along one reference ground track,\n", - "# concatenate all points from all dates into one xr.Dataset\n", - "da = xr.concat(\n", - " objs=(\n", - " catalog.icesat2atl06(date=date, laser=\"gt1r\")\n", - " .to_dask()\n", - " .sel(referencegroundtrack=referencegroundtrack)\n", - " for date in crossing_dates\n", - " ),\n", - " dim=pd.Index(data=crossing_dates, name=\"crossingdates\"),\n", - ")" - ] - }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "def six_laser_beams(crossing_dates: list):\n", + "def six_laser_beams(filepaths: list) -> dask.dataframe.DataFrame:\n", " \"\"\"\n", " For all 6 lasers along one reference ground track,\n", - " concatenate all points from all crossing dates into one xr.Dataset\n", + " concatenate all points from all crossing dates into one Dask DataFrame\n", + "\n", + " E.g. if there are 5 crossing dates and 6 lasers,\n", + " there would be data from 5 x 6 = 30 files being concatenated together.\n", " \"\"\"\n", - " lasers = [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", + " lasers: list = [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", "\n", - " objs = [\n", + " objs: list = [\n", " xr.open_mfdataset(\n", - " paths=crossing_dates.values(),\n", - " combine=\"nested\",\n", + " paths=filepaths,\n", + " combine=\"by_coords\",\n", " engine=\"h5netcdf\",\n", - " concat_dim=\"delta_time\",\n", " group=f\"{laser}/land_ice_segments\",\n", " parallel=True,\n", " ).assign_coords(coords={\"laser\": laser})\n", @@ -1235,13 +1317,12 @@ " ]\n", "\n", " try:\n", - " da = xr.concat(\n", - " objs=objs, dim=\"laser\"\n", - " ) # dim=pd.Index(data=lasers, name=\"laser\")\n", - " df = da.unify_chunks().to_dask_dataframe()\n", + " da: xr.Dataset = xr.concat(objs=objs, dim=\"laser\")\n", + " df: dask.dataframe.DataFrame = da.unify_chunks().to_dask_dataframe()\n", " except ValueError:\n", - " # ValueError: cannot reindex or align along dimension 'delta_time' because the index has duplicate values\n", - " df = dask.dataframe.concat(\n", + " # ValueError: cannot reindex or align along dimension 'delta_time'\n", + " # because the index has duplicate values\n", + " df: dask.dataframe.DataFrame = dask.dataframe.concat(\n", " [obj.unify_chunks().to_dask_dataframe() for obj in objs]\n", " )\n", "\n", @@ -1250,27 +1331,25 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "dataset_dict = {}\n", - "# for referencegroundtrack in list(crossing_dates_dict)[349:350]: # ReferenceGroundTrack goes from 0001 to 1387\n", - "for referencegroundtrack in list(crossing_dates_dict)[\n", - " 340:350\n", - "]: # ReferenceGroundTrack goes from 0001 to 1387\n", + "# ReferenceGroundTrack goes from 0001 to 1387\n", + "for referencegroundtrack in list(crossing_dates_dict)[348:349]:\n", " # print(referencegroundtrack)\n", - " if len(crossing_dates_dict[referencegroundtrack]) > 0:\n", - " da = dask.delayed(six_laser_beams)(\n", - " crossing_dates=crossing_dates_dict[referencegroundtrack]\n", + " filepaths = list(crossing_dates_dict[referencegroundtrack].values())\n", + " if len(filepaths) > 0:\n", + " dataset_dict[referencegroundtrack] = dask.delayed(obj=six_laser_beams)(\n", + " filepaths=filepaths\n", " )\n", - " # da = six_laser_beams(crossing_dates=crossing_dates_dict[referencegroundtrack])\n", - " dataset_dict[referencegroundtrack] = da" + " # df = six_laser_beams(filepaths=filepaths)" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -1279,7 +1358,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -1315,7 +1394,7 @@ "
        sigma_geo_h
        npartitions=115npartitions=154float32
        569904340326............
        145223228445120............
        151814158745557.........
        \n", "\n", - "
        Dask Name: concat-indexed, 16082 tasks
        " + "
        Dask Name: concat-indexed, 21362 tasks
        " ], "text/plain": [ "Dask DataFrame Structure:\n", " delta_time laser latitude longitude atl06_quality_summary h_li h_li_sigma segment_id sigma_geo_h\n", - "npartitions=115 \n", + "npartitions=154 \n", "0 datetime64[ns] object float64 float64 float64 float32 float32 float64 float32\n", - "569904 ... ... ... ... ... ... ... ... ...\n", + "340326 ... ... ... ... ... ... ... ... ...\n", "... ... ... ... ... ... ... ... ... ...\n", - "14522322 ... ... ... ... ... ... ... ... ...\n", - "15181415 ... ... ... ... ... ... ... ... ...\n", - "Dask Name: concat-indexed, 16082 tasks" + "8445120 ... ... ... ... ... ... ... ... ...\n", + "8745557 ... ... ... ... ... ... ... ... ...\n", + "Dask Name: concat-indexed, 21362 tasks" ] }, - "execution_count": 14, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -1423,57 +1502,30 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dataset_dict = dask.compute(dataset_dict)[\n", - " 0\n", - "] # compute every referencegroundtrack, slow... though somewhat parallelized" - ] - }, - { - "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ - "bdf = dask.dataframe.concat(dfs=list(dataset_dict.values()))" + "# compute every referencegroundtrack, slow... though somewhat parallelized\n", + "# dataset_dict = dask.compute(dataset_dict)[0]" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ - "da.sel(crossingdates=\"2018.10.21\").h_li.unify_chunks().drop(\n", - " labels=[\"longitude\", \"datetime\", \"cyclenumber\"]\n", - ").hvplot(\n", - " kind=\"scatter\",\n", - " x=\"latitude\",\n", - " by=\"crossingdates\",\n", - " datashade=True,\n", - " dynspread=True,\n", - " width=800,\n", - " height=500,\n", - " dynamic=True,\n", - " flip_xaxis=True,\n", - " hover=True,\n", - ")" + "# big dataframe containing data across all 1387 reference ground tracks!\n", + "# bdf = dask.dataframe.concat(dfs=list(dataset_dict.values()))" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "lines_to_next_cell": 0 + }, "outputs": [], "source": [] }, @@ -1487,9 +1539,8 @@ "lasers = [\"gt1l\", \"gt1r\", \"gt2l\", \"gt2r\", \"gt3l\", \"gt3r\"]\n", "da = xr.concat(\n", " objs=(\n", - " catalog.icesat2atl06(laser=laser)\n", + " catalog.icesat2atl06(laser=laser, referencegroundtrack=referencegroundtrack)\n", " .to_dask()\n", - " #.sel(referencegroundtrack=referencegroundtrack)\n", " for laser in lasers\n", " ),\n", " dim=pd.Index(data=lasers, name=\"laser\")\n", @@ -1507,31 +1558,32 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Plot them points!" + "## Plot ATL06 points!" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ - "# convert dask.dataframe to pd.DataFrame\n", - "df = df.compute()" + "# Convert dask.DataFrame to pd.DataFrame\n", + "df: pd.DataFrame = df.compute()" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 17, "metadata": {}, "outputs": [], "source": [ + "# Drop points with poor quality\n", "df = df.dropna(subset=[\"h_li\"]).query(expr=\"atl06_quality_summary == 0\").reset_index()" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 18, "metadata": {}, "outputs": [ { @@ -1569,204 +1621,106 @@ " \n", "
      • 6402018-10-21 12:20:38.549732352gt3l-78.991839-147.18558354285332686112019-01-20 08:07:46.068493512gt2r-73.01619548.0624890.0572.6864010.0156051443620.00.3011773152.4343260.0285301599374.00.312982
        8522018-10-21 12:20:38.55255157632434019463562018-10-21 12:28:54.317352440gt3l-78.992014-147.185765-69.44853146.7692370.0572.7093510.0191221443621.00.3001442206.0698240.0469081619499.00.480249
        10642018-10-21 12:20:38.555369756gt3l-78.992189-147.1859471182727096352018-10-21 12:27:17.481998464gt2r-75.54617349.5476220.0572.7644650.0229261443622.00.3000813388.3486330.0105081585012.00.307785
        12762018-10-21 12:20:38.558187524gt3l-78.992364-147.18613086847452378372019-04-21 03:47:11.232018304gt3r-74.74813949.0803370.0572.8220830.0141141443623.00.303498
        14882018-10-21 12:20:38.561005216gt3l-78.992538-147.1863130.0572.8342290.0188181443624.00.300117
        .................................3388.1450200.0273481589537.00.304449
        1801924112593192019-10-19 18:59:56.522928840gt1r-79.163368-146.952320129735380802322020-01-18 14:47:39.217810632gt2l-70.28478946.9081610.0544.2109990.0111271444515.00.312816
        1801930112593552019-10-19 18:59:56.525754424gt1r-79.163543-146.9525070.0544.1467290.0114161444516.00.338958
        1801936112593912019-10-19 18:59:56.528577288gt1r-79.163717-146.9526940.0544.0767820.0100851444517.00.322600
        1801942112594272019-10-19 18:59:56.531397512gt1r-79.163892-146.9528810.0543.9666750.0097021444518.00.322036
        1801948112594632019-10-19 18:59:56.534215528gt1r-79.164067-146.9530680.0543.8785400.0102631444519.00.3148432500.0930180.0300891614817.00.359534
        \n", - "

        21482 rows × 10 columns

        \n", "
      " ], "text/plain": [ - " index delta_time laser latitude longitude \\\n", - "6 40 2018-10-21 12:20:38.549732352 gt3l -78.991839 -147.185583 \n", - "8 52 2018-10-21 12:20:38.552551576 gt3l -78.992014 -147.185765 \n", - "10 64 2018-10-21 12:20:38.555369756 gt3l -78.992189 -147.185947 \n", - "12 76 2018-10-21 12:20:38.558187524 gt3l -78.992364 -147.186130 \n", - "14 88 2018-10-21 12:20:38.561005216 gt3l -78.992538 -147.186313 \n", - "... ... ... ... ... ... \n", - "1801924 11259319 2019-10-19 18:59:56.522928840 gt1r -79.163368 -146.952320 \n", - "1801930 11259355 2019-10-19 18:59:56.525754424 gt1r -79.163543 -146.952507 \n", - "1801936 11259391 2019-10-19 18:59:56.528577288 gt1r -79.163717 -146.952694 \n", - "1801942 11259427 2019-10-19 18:59:56.531397512 gt1r -79.163892 -146.952881 \n", - "1801948 11259463 2019-10-19 18:59:56.534215528 gt1r -79.164067 -146.953068 \n", - "\n", - " atl06_quality_summary h_li h_li_sigma segment_id \\\n", - "6 0.0 572.686401 0.015605 1443620.0 \n", - "8 0.0 572.709351 0.019122 1443621.0 \n", - "10 0.0 572.764465 0.022926 1443622.0 \n", - "12 0.0 572.822083 0.014114 1443623.0 \n", - "14 0.0 572.834229 0.018818 1443624.0 \n", - "... ... ... ... ... \n", - "1801924 0.0 544.210999 0.011127 1444515.0 \n", - "1801930 0.0 544.146729 0.011416 1444516.0 \n", - "1801936 0.0 544.076782 0.010085 1444517.0 \n", - "1801942 0.0 543.966675 0.009702 1444518.0 \n", - "1801948 0.0 543.878540 0.010263 1444519.0 \n", + " index delta_time laser latitude longitude \\\n", + "542853 3268611 2019-01-20 08:07:46.068493512 gt2r -73.016195 48.062489 \n", + "324340 1946356 2018-10-21 12:28:54.317352440 gt3l -69.448531 46.769237 \n", + "118272 709635 2018-10-21 12:27:17.481998464 gt2r -75.546173 49.547622 \n", + "868474 5237837 2019-04-21 03:47:11.232018304 gt3r -74.748139 49.080337 \n", + "1297353 8080232 2020-01-18 14:47:39.217810632 gt2l -70.284789 46.908161 \n", + "\n", + " atl06_quality_summary h_li h_li_sigma segment_id \\\n", + "542853 0.0 3152.434326 0.028530 1599374.0 \n", + "324340 0.0 2206.069824 0.046908 1619499.0 \n", + "118272 0.0 3388.348633 0.010508 1585012.0 \n", + "868474 0.0 3388.145020 0.027348 1589537.0 \n", + "1297353 0.0 2500.093018 0.030089 1614817.0 \n", "\n", " sigma_geo_h \n", - "6 0.301177 \n", - "8 0.300144 \n", - "10 0.300081 \n", - "12 0.303498 \n", - "14 0.300117 \n", - "... ... \n", - "1801924 0.312816 \n", - "1801930 0.338958 \n", - "1801936 0.322600 \n", - "1801942 0.322036 \n", - "1801948 0.314843 \n", - "\n", - "[21482 rows x 10 columns]" + "542853 0.312982 \n", + "324340 0.480249 \n", + "118272 0.307785 \n", + "868474 0.304449 \n", + "1297353 0.359534 " ] }, - "execution_count": 17, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "dfs = df.query(expr=\"0 <= segment_id - 1443620 < 900\")\n", - "dfs" + "# Get a small random sample of our data\n", + "dfs = df.sample(n=1_000, random_state=42)\n", + "dfs.head()" ] }, { @@ -1786,12 +1740,10 @@ ] }, { - "cell_type": "code", - "execution_count": 19, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "import pyproj" + "### Transform from EPSG:4326 (lat/lon) to EPSG:3031 (Antarctic Polar Stereographic)" ] }, { @@ -1811,20 +1763,7 @@ "cell_type": "code", "execution_count": 21, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - ":1: SettingWithCopyWarning: \n", - "A value is trying to be set on a copy of a slice from a DataFrame.\n", - "Try using .loc[row_indexer,col_indexer] = value instead\n", - "\n", - "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", - " dfs[\"x\"], dfs[\"y\"] = transformer.transform(xx=dfs.longitude.values, yy=dfs.latitude.values)\n" - ] - } - ], + "outputs": [], "source": [ "dfs[\"x\"], dfs[\"y\"] = transformer.transform(\n", " xx=dfs.longitude.values, yy=dfs.latitude.values\n", @@ -1873,216 +1812,105 @@ " \n", "
    • 6402018-10-21 12:20:38.549732352gt3l-78.991839-147.1855830.0572.6864010.0156051443620.00.301177-650091.025258-1.008187e+06
      8522018-10-21 12:20:38.552551576gt3l-78.992014-147.18576554285332686112019-01-20 08:07:46.068493512gt2r-73.01619548.0624890.0572.7093510.0191221443621.00.300144-650077.439154-1.008173e+063152.4343260.0285301599374.00.3129821.382429e+061.242017e+06
      10642018-10-21 12:20:38.55536975632434019463562018-10-21 12:28:54.317352440gt3l-78.992189-147.185947-69.44853146.7692370.0572.7644650.0229261443622.00.300081-650063.849228-1.008159e+062206.0698240.0469081619499.00.4802491.643908e+061.545394e+06
      12762018-10-21 12:20:38.558187524gt3l-78.992364-147.1861301182727096352018-10-21 12:27:17.481998464gt2r-75.54617349.5476220.0572.8220830.0141141443623.00.303498-650050.252182-1.008144e+063388.3486330.0105081585012.00.3077851.201144e+061.024148e+06
      14882018-10-21 12:20:38.561005216gt3l-78.992538-147.18631386847452378372019-04-21 03:47:11.232018304gt3r-74.74813949.0803370.0572.8342290.0188181443624.00.300117-650036.644541-1.008130e+06
      .......................................3388.1450200.0273481589537.00.3044491.259340e+061.091631e+06
      1801924112593192019-10-19 18:59:56.522928840gt1r-79.163368-146.952320129735380802322020-01-18 14:47:39.217810632gt2l-70.28478946.9081610.0544.2109990.0111271444515.00.312816-643937.547615-9.897727e+05
      1801930112593552019-10-19 18:59:56.525754424gt1r-79.163543-146.9525070.0544.1467290.0114161444516.00.338958-643923.872690-9.897587e+05
      1801936112593912019-10-19 18:59:56.528577288gt1r-79.163717-146.9526940.0544.0767820.0100851444517.00.322600-643910.196765-9.897448e+05
      1801942112594272019-10-19 18:59:56.531397512gt1r-79.163892-146.9528810.0543.9666750.0097021444518.00.322036-643896.524591-9.897308e+05
      1801948112594632019-10-19 18:59:56.534215528gt1r-79.164067-146.9530680.0543.8785400.0102631444519.00.314843-643882.859621-9.897169e+052500.0930180.0300891614817.00.3595341.579289e+061.477451e+06
      \n", - "

      21482 rows × 12 columns

      \n", "
    " ], "text/plain": [ - " index delta_time laser latitude longitude \\\n", - "6 40 2018-10-21 12:20:38.549732352 gt3l -78.991839 -147.185583 \n", - "8 52 2018-10-21 12:20:38.552551576 gt3l -78.992014 -147.185765 \n", - "10 64 2018-10-21 12:20:38.555369756 gt3l -78.992189 -147.185947 \n", - "12 76 2018-10-21 12:20:38.558187524 gt3l -78.992364 -147.186130 \n", - "14 88 2018-10-21 12:20:38.561005216 gt3l -78.992538 -147.186313 \n", - "... ... ... ... ... ... \n", - "1801924 11259319 2019-10-19 18:59:56.522928840 gt1r -79.163368 -146.952320 \n", - "1801930 11259355 2019-10-19 18:59:56.525754424 gt1r -79.163543 -146.952507 \n", - "1801936 11259391 2019-10-19 18:59:56.528577288 gt1r -79.163717 -146.952694 \n", - "1801942 11259427 2019-10-19 18:59:56.531397512 gt1r -79.163892 -146.952881 \n", - "1801948 11259463 2019-10-19 18:59:56.534215528 gt1r -79.164067 -146.953068 \n", - "\n", - " atl06_quality_summary h_li h_li_sigma segment_id \\\n", - "6 0.0 572.686401 0.015605 1443620.0 \n", - "8 0.0 572.709351 0.019122 1443621.0 \n", - "10 0.0 572.764465 0.022926 1443622.0 \n", - "12 0.0 572.822083 0.014114 1443623.0 \n", - "14 0.0 572.834229 0.018818 1443624.0 \n", - "... ... ... ... ... \n", - "1801924 0.0 544.210999 0.011127 1444515.0 \n", - "1801930 0.0 544.146729 0.011416 1444516.0 \n", - "1801936 0.0 544.076782 0.010085 1444517.0 \n", - "1801942 0.0 543.966675 0.009702 1444518.0 \n", - "1801948 0.0 543.878540 0.010263 1444519.0 \n", - "\n", - " sigma_geo_h x y \n", - "6 0.301177 -650091.025258 -1.008187e+06 \n", - "8 0.300144 -650077.439154 -1.008173e+06 \n", - "10 0.300081 -650063.849228 -1.008159e+06 \n", - "12 0.303498 -650050.252182 -1.008144e+06 \n", - "14 0.300117 -650036.644541 -1.008130e+06 \n", - "... ... ... ... \n", - "1801924 0.312816 -643937.547615 -9.897727e+05 \n", - "1801930 0.338958 -643923.872690 -9.897587e+05 \n", - "1801936 0.322600 -643910.196765 -9.897448e+05 \n", - "1801942 0.322036 -643896.524591 -9.897308e+05 \n", - "1801948 0.314843 -643882.859621 -9.897169e+05 \n", - "\n", - "[21482 rows x 12 columns]" + " index delta_time laser latitude longitude \\\n", + "542853 3268611 2019-01-20 08:07:46.068493512 gt2r -73.016195 48.062489 \n", + "324340 1946356 2018-10-21 12:28:54.317352440 gt3l -69.448531 46.769237 \n", + "118272 709635 2018-10-21 12:27:17.481998464 gt2r -75.546173 49.547622 \n", + "868474 5237837 2019-04-21 03:47:11.232018304 gt3r -74.748139 49.080337 \n", + "1297353 8080232 2020-01-18 14:47:39.217810632 gt2l -70.284789 46.908161 \n", + "\n", + " atl06_quality_summary h_li h_li_sigma segment_id \\\n", + "542853 0.0 3152.434326 0.028530 1599374.0 \n", + "324340 0.0 2206.069824 0.046908 1619499.0 \n", + "118272 0.0 3388.348633 0.010508 1585012.0 \n", + "868474 0.0 3388.145020 0.027348 1589537.0 \n", + "1297353 0.0 2500.093018 0.030089 1614817.0 \n", + "\n", + " sigma_geo_h x y \n", + "542853 0.312982 1.382429e+06 1.242017e+06 \n", + "324340 0.480249 1.643908e+06 1.545394e+06 \n", + "118272 0.307785 1.201144e+06 1.024148e+06 \n", + "868474 0.304449 1.259340e+06 1.091631e+06 \n", + "1297353 0.359534 1.579289e+06 1.477451e+06 " ] }, "execution_count": 22, @@ -2091,7 +1919,7 @@ } ], "source": [ - "dfs" + "dfs.head()" ] }, { @@ -2116,12 +1944,13 @@ "metadata": {}, "outputs": [], "source": [ + "# Plot cross section view\n", "dfs.hvplot.scatter(x=\"x\", y=\"h_li\", by=\"laser\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ @@ -2139,7 +1968,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Old making a DEM grid surface from points" + "## Experimental Work-in-Progress stuff below" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Old way of making a DEM grid surface from points" ] }, { @@ -2210,7 +2046,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -2425,12 +2261,7 @@ ], "metadata": { "jupytext": { - "text_representation": { - "extension": ".py", - "format_name": "hydrogen", - "format_version": "1.3", - "jupytext_version": "1.4.2" - } + "formats": "ipynb,py:hydrogen" }, "kernelspec": { "display_name": "deepicedrain", diff --git a/atl06_play.py b/atl06_play.py index 4758f9b..14562e2 100644 --- a/atl06_play.py +++ b/atl06_play.py @@ -1,6 +1,7 @@ # --- # jupyter: # jupytext: +# formats: ipynb,py:hydrogen # text_representation: # extension: .py # format_name: hydrogen @@ -36,6 +37,7 @@ import netrc import os +import cartopy import dask import dask.distributed import hvplot.dask @@ -45,6 +47,7 @@ import matplotlib.pyplot as plt import numpy as np import pandas as pd +import pyproj import requests import tqdm import xarray as xr @@ -53,12 +56,11 @@ # %% # Configure intake and set number of compute cores for data download -intake.config.conf["cache_dir"] = "catdir" # saves data to current folder intake.config.conf["download_progress"] = False # disable automatic tqdm progress bars logging.basicConfig(level=logging.WARNING) -# Limit compute to 8 cores for download part using intake +# Limit compute to 10 cores for download part using intake # Can possibly go up to 10 because there are 10 DPs? # See https://n5eil02u.ecs.nsidc.org/opendap/hyrax/catalog.xml client = dask.distributed.Client(n_workers=10, threads_per_worker=1) @@ -72,9 +74,8 @@ # and view it using [xarray](https://xarray.pydata.org) and [hvplot](https://hvplot.pyviz.org). # %% -catalog = intake.open_catalog( - uri="catalog.yaml" -) # open the local catalog file containing ICESAT2 stuff +# open the local catalog file containing ICESat-2 stuff +catalog = intake.open_catalog(uri="catalog.yaml") # %% try: @@ -87,15 +88,24 @@ ) raise -dataset = ( - catalog.icesat2atl06.to_dask().unify_chunks() -) # depends on .netrc file in home folder +# depends on .netrc file in home folder +dataset = catalog.icesat2atl06.to_dask().unify_chunks() dataset # %% # dataset.hvplot.points( -# x="longitude", y="latitude", datashade=True, width=800, height=500, hover=True, -# #geo=True, coastline=True, crs=cartopy.crs.PlateCarree(), #projection=cartopy.crs.Stereographic(central_latitude=-71), +# x="longitude", +# y="latitude", +# c="h_li", +# cmap="Blues", +# rasterize=True, +# hover=True, +# width=800, +# height=500, +# geo=True, +# coastline=True, +# crs=cartopy.crs.PlateCarree(), +# projection=cartopy.crs.Stereographic(central_latitude=-71), # ) catalog.icesat2atl06.hvplot.quickview() @@ -151,32 +161,6 @@ " please delete those folders and retry again!" ) -# %% [raw] -# with tqdm.tqdm(total=len(dates)) as pbar: -# for date in dates: -# source = catalog.icesat2atlasdownloader(date=date) -# source_urlpath = source.urlpath -# try: -# pbar.set_postfix_str(f"Obtaining files from {source_urlpath}") -# source.discover() # triggers download of the file(s), or loads from cache -# except (requests.HTTPError, OSError, KeyError, TypeError) as error: -# # clear cache and try again -# print(f"Errored: {error}, trying again") -# source.cache[0].clear_cache(urlpath=source_urlpath) -# source.discover() -# except (ValueError, pd.core.index.InvalidIndexError) as error: -# print(f"Errored: {error}, ignoring") -# pass -# pbar.update(n=1) -# #finally: -# # source.close() -# # del source - -# %% [raw] -# catalog.icesat2atl06(date="2019.06.24", laser="gt1l").discover() # ValueError?? -# catalog.icesat2atl06(date="2019.02.28", laser="gt2l").discover() # InvalidIndexError -# catalog.icesat2atl06(date="2019.11.13", laser="gt2l").discover() # ValueError - # %% # %% [markdown] @@ -186,22 +170,21 @@ # we can have some fun with visualizing the point clouds! # %% -dataset = ( - catalog.icesat2atl06.to_dask() -) # unfortunately, we have to load this in dask to get the path... -root_directory = os.path.dirname(os.path.dirname(dataset.encoding["source"])) +root_directory = os.path.dirname( + catalog.icesat2atl06.storage_options["simplecache"]["cache_storage"] +) # %% def get_crossing_dates( catalog_entry: intake.catalog.local.LocalCatalogEntry, root_directory: str, referencegroundtrack: str = "????", - datetime="*", - cyclenumber="??", - orbitalsegment="??", - version="003", - revision="01", -): + datetimestr: str = "*", + cyclenumber: str = "??", + orbitalsegment: str = "??", + version: str = "003", + revision: str = "01", +) -> dict: """ Given a 4-digit reference groundtrack (e.g. 1234), we output a dictionary where the @@ -210,10 +193,10 @@ def get_crossing_dates( """ # Get a glob string that looks like "ATL06_??????????????_XXXX????_002_01.h5" - globpath = catalog_entry.path_as_pattern - if datetime == "*": - globpath = globpath.replace("{datetime:%Y%m%d%H%M%S}", "??????????????") - globpath = globpath.format( + globpath: str = catalog_entry.path_as_pattern + if datetimestr == "*": + globpath: str = globpath.replace("{datetime:%Y%m%d%H%M%S}", "??????????????") + globpath: str = globpath.format( referencegroundtrack=referencegroundtrack, cyclenumber=cyclenumber, orbitalsegment=orbitalsegment, @@ -222,11 +205,11 @@ def get_crossing_dates( ) # Get list of filepaths (dates are contained in the filepath) - globedpaths = glob.glob(os.path.join(root_directory, "??????????", globpath)) + globedpaths: list = glob.glob(os.path.join(root_directory, "??????????", globpath)) # Pick out just the dates in "YYYY.MM.DD" format from the globedpaths # crossingdates = [os.path.basename(os.path.dirname(p=p)) for p in globedpaths] - crossingdates = { + crossingdates: dict = { os.path.basename(os.path.dirname(p=p)): p for p in sorted(globedpaths) } @@ -235,9 +218,9 @@ def get_crossing_dates( # %% crossing_dates_dict = {} -for rgt in range(0, 1388): # ReferenceGroundTrack goes from 0001 to 1387 - referencegroundtrack = f"{rgt}".zfill(4) - crossing_dates = dask.delayed(get_crossing_dates)( +for rgt in range(1, 1388): # ReferenceGroundTrack goes from 0001 to 1387 + referencegroundtrack: str = f"{rgt}".zfill(4) + crossing_dates: dict = dask.delayed(get_crossing_dates)( catalog_entry=catalog.icesat2atl06, root_directory=root_directory, referencegroundtrack=referencegroundtrack, @@ -252,33 +235,22 @@ def get_crossing_dates( # %% [markdown] # ![ICESat-2 Laser Beam Pattern](https://ars.els-cdn.com/content/image/1-s2.0-S0034425719303712-gr1.jpg) -# %% [raw] -# # For one laser along one reference ground track, -# # concatenate all points from all dates into one xr.Dataset -# da = xr.concat( -# objs=( -# catalog.icesat2atl06(date=date, laser="gt1r") -# .to_dask() -# .sel(referencegroundtrack=referencegroundtrack) -# for date in crossing_dates -# ), -# dim=pd.Index(data=crossing_dates, name="crossingdates"), -# ) - # %% -def six_laser_beams(crossing_dates: list): +def six_laser_beams(filepaths: list) -> dask.dataframe.DataFrame: """ For all 6 lasers along one reference ground track, - concatenate all points from all crossing dates into one xr.Dataset + concatenate all points from all crossing dates into one Dask DataFrame + + E.g. if there are 5 crossing dates and 6 lasers, + there would be data from 5 x 6 = 30 files being concatenated together. """ - lasers = ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"] + lasers: list = ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"] - objs = [ + objs: list = [ xr.open_mfdataset( - paths=crossing_dates.values(), - combine="nested", + paths=filepaths, + combine="by_coords", engine="h5netcdf", - concat_dim="delta_time", group=f"{laser}/land_ice_segments", parallel=True, ).assign_coords(coords={"laser": laser}) @@ -286,13 +258,12 @@ def six_laser_beams(crossing_dates: list): ] try: - da = xr.concat( - objs=objs, dim="laser" - ) # dim=pd.Index(data=lasers, name="laser") - df = da.unify_chunks().to_dask_dataframe() + da: xr.Dataset = xr.concat(objs=objs, dim="laser") + df: dask.dataframe.DataFrame = da.unify_chunks().to_dask_dataframe() except ValueError: - # ValueError: cannot reindex or align along dimension 'delta_time' because the index has duplicate values - df = dask.dataframe.concat( + # ValueError: cannot reindex or align along dimension 'delta_time' + # because the index has duplicate values + df: dask.dataframe.DataFrame = dask.dataframe.concat( [obj.unify_chunks().to_dask_dataframe() for obj in objs] ) @@ -301,17 +272,15 @@ def six_laser_beams(crossing_dates: list): # %% dataset_dict = {} -# for referencegroundtrack in list(crossing_dates_dict)[349:350]: # ReferenceGroundTrack goes from 0001 to 1387 -for referencegroundtrack in list(crossing_dates_dict)[ - 340:350 -]: # ReferenceGroundTrack goes from 0001 to 1387 +# ReferenceGroundTrack goes from 0001 to 1387 +for referencegroundtrack in list(crossing_dates_dict)[348:349]: # print(referencegroundtrack) - if len(crossing_dates_dict[referencegroundtrack]) > 0: - da = dask.delayed(six_laser_beams)( - crossing_dates=crossing_dates_dict[referencegroundtrack] + filepaths = list(crossing_dates_dict[referencegroundtrack].values()) + if len(filepaths) > 0: + dataset_dict[referencegroundtrack] = dask.delayed(obj=six_laser_beams)( + filepaths=filepaths ) - # da = six_laser_beams(crossing_dates=crossing_dates_dict[referencegroundtrack]) - dataset_dict[referencegroundtrack] = da + # df = six_laser_beams(filepaths=filepaths) # %% df = dataset_dict["0349"].compute() # loads into a dask dataframe (lazy) @@ -322,33 +291,14 @@ def six_laser_beams(crossing_dates: list): # %% # %% -dataset_dict = dask.compute(dataset_dict)[ - 0 -] # compute every referencegroundtrack, slow... though somewhat parallelized - -# %% -bdf = dask.dataframe.concat(dfs=list(dataset_dict.values())) +# compute every referencegroundtrack, slow... though somewhat parallelized +# dataset_dict = dask.compute(dataset_dict)[0] # %% +# big dataframe containing data across all 1387 reference ground tracks! +# bdf = dask.dataframe.concat(dfs=list(dataset_dict.values())) # %% -da.sel(crossingdates="2018.10.21").h_li.unify_chunks().drop( - labels=["longitude", "datetime", "cyclenumber"] -).hvplot( - kind="scatter", - x="latitude", - by="crossingdates", - datashade=True, - dynspread=True, - width=800, - height=500, - dynamic=True, - flip_xaxis=True, - hover=True, -) - -# %% - # %% [raw] # # https://xarray.pydata.org/en/stable/combining.html#concatenate # # For all 6 lasers one one date ~~along one reference ground track~~, @@ -356,9 +306,8 @@ def six_laser_beams(crossing_dates: list): # lasers = ["gt1l", "gt1r", "gt2l", "gt2r", "gt3l", "gt3r"] # da = xr.concat( # objs=( -# catalog.icesat2atl06(laser=laser) +# catalog.icesat2atl06(laser=laser, referencegroundtrack=referencegroundtrack) # .to_dask() -# #.sel(referencegroundtrack=referencegroundtrack) # for laser in lasers # ), # dim=pd.Index(data=lasers, name="laser") @@ -367,18 +316,20 @@ def six_laser_beams(crossing_dates: list): # %% # %% [markdown] -# ## Plot them points! +# ## Plot ATL06 points! # %% -# convert dask.dataframe to pd.DataFrame -df = df.compute() +# Convert dask.DataFrame to pd.DataFrame +df: pd.DataFrame = df.compute() # %% +# Drop points with poor quality df = df.dropna(subset=["h_li"]).query(expr="atl06_quality_summary == 0").reset_index() # %% -dfs = df.query(expr="0 <= segment_id - 1443620 < 900") -dfs +# Get a small random sample of our data +dfs = df.sample(n=1_000, random_state=42) +dfs.head() # %% dfs.hvplot.scatter( @@ -390,8 +341,8 @@ def six_laser_beams(crossing_dates: list): # width=800, height=500, colorbar=True ) -# %% -import pyproj +# %% [markdown] +# ### Transform from EPSG:4326 (lat/lon) to EPSG:3031 (Antarctic Polar Stereographic) # %% transformer = pyproj.Transformer.from_crs( @@ -406,7 +357,7 @@ def six_laser_beams(crossing_dates: list): ) # %% -dfs +dfs.head() # %% dfs.hvplot.scatter( @@ -419,6 +370,7 @@ def six_laser_beams(crossing_dates: list): ) # %% +# Plot cross section view dfs.hvplot.scatter(x="x", y="h_li", by="laser") # %% @@ -427,7 +379,10 @@ def six_laser_beams(crossing_dates: list): # %% # %% [markdown] -# ## Old making a DEM grid surface from points +# ## Experimental Work-in-Progress stuff below + +# %% [markdown] +# ### Old way of making a DEM grid surface from points # %% import scipy diff --git a/catalog.yaml b/catalog.yaml index e3d4b6e..4c7efa9 100644 --- a/catalog.yaml +++ b/catalog.yaml @@ -47,12 +47,12 @@ sources: delta_time: 50000 path_as_pattern: ATL06_{datetime:%Y%m%d%H%M%S}_{referencegroundtrack:4}{cyclenumber:2}{orbitalsegment:2}_{version:3}_{revision:2}.h5 # urlpath: https://n5eil02u.ecs.nsidc.org/opendap/hyrax/ATLAS/ATL06.003/{{date.strftime("%Y.%m.%d")}}/ATL06_*_*{{orbitalsegment}}_003_01.h5 - urlpath: simplecache::https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.00{{version}}/{{date.strftime("%Y.%m.%d")}}/ATL06_*_*{{orbitalsegment}}_00{{version}}_01.h5 + urlpath: simplecache::https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.00{{version}}/{{date.strftime("%Y.%m.%d")}}/ATL06_*_{{referencegroundtrack}}*{{orbitalsegment}}_00{{version}}_01.h5 xarray_kwargs: - combine: nested - concat_dim: referencegroundtrack # from 0000 to 1387 + combine: by_coords engine: h5netcdf group: /{{laser}}/land_ice_segments + mask_and_scale: true parallel: true storage_options: simplecache: @@ -63,9 +63,13 @@ sources: date: description: Year, month, and day of data acquisition type: datetime - default: 2019.06.26 + default: 2020.03.06 min: 2018.10.14 max: 2020.03.06 # note missing 2018.12.09, and gap from 2019.06.27 to 2019.07.25 (inclusive) + referencegroundtrack: + description: ICESat-2 Reference Ground Track number + type: str + default: "" # Default: "" (all), min: "0000", max: "1387" orbitalsegment: description: Orbital Segment type: str @@ -86,9 +90,14 @@ sources: metadata: plots: quickview: - kind: scatter + kind: points x: longitude y: latitude - datashade: True + c: h_li + cmap: Blues + rasterize: True + hover: True width: 800 height: 500 + geo: True + coastline: True