Skip to content

Commit

Permalink
test
Browse files Browse the repository at this point in the history
  • Loading branch information
ygliuvt committed Oct 10, 2023
1 parent 512d999 commit e543f96
Showing 1 changed file with 1 addition and 275 deletions.
276 changes: 1 addition & 275 deletions examples/intro_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -354,280 +354,6 @@
"\n",
"The `result_urls()` method calls `wait_for_processing()` and returns a list of the processed data URLs once processing is complete. You may optionally show the progress bar as shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b26d572c",
"metadata": {},
"outputs": [],
"source": [
"urls = harmony_client.result_urls(job_id, show_progress=True)\n",
"urls"
]
},
{
"cell_type": "markdown",
"id": "55aac772",
"metadata": {},
"source": [
"#### Download files"
]
},
{
"cell_type": "markdown",
"id": "1a46cae7",
"metadata": {},
"source": [
"The next code block utilizes `download_all()` to download all data output file URLs. This is a non-blocking step during the download itself, but this line will block subsequent code while waiting for the job to finish processing. You can optionally specify a directory and specify whether to overwrite existing files as shown below.\n",
"\n",
"You may call `result()` on future objects (those that are awaiting processing) to realize them. A call to `result()` blocks until that particular future object finishes downloading. Other future objects will download in the background, in parallel. When downloading is complete, the future objects will return the file path string of the file that was just downloaded. This file path can then be fed into other libraries that may read the data files and perform other operations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f97b857",
"metadata": {},
"outputs": [],
"source": [
"results = harmony_client.download_all(job_id, directory='/tmp', overwrite=True)\n",
"file_names = [f.result() for f in results]\n",
"file_names"
]
},
{
"cell_type": "markdown",
"id": "d32f878f",
"metadata": {},
"source": [
"With `download()`, this will download only the URL specified, in case you would like more control over individual files."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ac86478",
"metadata": {},
"outputs": [],
"source": [
"file_name = harmony_client.download(next(urls), overwrite=True).result()\n",
"file_name"
]
},
{
"cell_type": "markdown",
"id": "0fc6cfc0",
"metadata": {},
"source": [
"### Visualize Downloaded Outputs\n",
"\n",
"The output image files can be visualized using the `Rasterio` library."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0c8eaad6",
"metadata": {},
"outputs": [],
"source": [
"for f in file_names:\n",
" rasterio.plot.show(rasterio.open(f))"
]
},
{
"cell_type": "markdown",
"id": "edfea75c",
"metadata": {},
"source": [
"### Explore output STAC catalog and retrieve results from s3\n",
"\n",
"A [STAC](https://stacspec.org/) catalog is returned in each Harmony request result. The stac items include not only the s3 locations for each output file, but also the spatial and temporal metadata representing each subsetted output. `stac_catalog_url` will return the URL of the catalog, and can also accept an optional progress status if desired."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef42bb5b",
"metadata": {},
"outputs": [],
"source": [
"stac_catalog_url = harmony_client.stac_catalog_url(job_id)\n",
"stac_catalog_url"
]
},
{
"cell_type": "markdown",
"id": "3027a2d9",
"metadata": {},
"source": [
"#### Using PySTAC:"
]
},
{
"cell_type": "markdown",
"id": "2d3d9fff",
"metadata": {},
"source": [
"Following the directions for PySTAC (https://pystac.readthedocs.io/en/latest/quickstart.html), we can view the timestamp and s3 locations of each STAC item:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1342ccd",
"metadata": {},
"outputs": [],
"source": [
"from pystac import Catalog\n",
"\n",
"cat = Catalog.from_file(stac_catalog_url)\n",
"\n",
"print(cat.title)\n",
"s3_links = []\n",
"for item in cat.get_all_items():\n",
" print(item.datetime, [asset.href for asset in item.assets.values()])\n",
" s3_links.append([asset.href for asset in item.assets.values()])"
]
},
{
"cell_type": "markdown",
"id": "f4d9c649",
"metadata": {},
"source": [
"#### Using intake-stac:\n",
"\n",
"View each item value returned:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef1eeb66",
"metadata": {},
"outputs": [],
"source": [
"import intake\n",
"cat = intake.open_stac_catalog(stac_catalog_url)\n",
"display(list(cat))"
]
},
{
"cell_type": "markdown",
"id": "583eca0f",
"metadata": {},
"source": [
"And the metadata contents of each item:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "914e0315",
"metadata": {},
"outputs": [],
"source": [
"entries = []\n",
"for id, entry in cat.search('type').items():\n",
" display(entry)\n",
" entries.append(entry)"
]
},
{
"cell_type": "markdown",
"id": "7edd3d65",
"metadata": {},
"source": [
"### Cloud in-place access \n",
"\n",
"**Note that the remainder of this tutorial will only succeed when running this notebook within the AWS us-west-2 region.** \n",
"\n",
"Harmony data outputs can be accessed within the cloud using the s3 URLs and AWS credentials provided in the Harmony job response. Below are examples using both `intake-stac` or `boto3` to access the data in the cloud. "
]
},
{
"cell_type": "markdown",
"id": "89054073",
"metadata": {},
"source": [
"#### AWS credential retrieval\n",
"\n",
"Using `aws_credentials` you can retrieve the credentials needed to access the Harmony s3 staging bucket and its contents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b278020a",
"metadata": {},
"outputs": [],
"source": [
"# NOTE: if you specified destination_url you'll have to retrieve your credentials in another manner\n",
"# https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html\n",
"creds = harmony_client.aws_credentials()\n",
"creds"
]
},
{
"cell_type": "markdown",
"id": "aa00b059",
"metadata": {},
"source": [
"#### Using boto3"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "45f798d9",
"metadata": {},
"outputs": [],
"source": [
"#\n",
"# NOTE: Execution of this cell will only succeed within the AWS us-west-2 region. \n",
"#\n",
"\n",
"import boto3\n",
"\n",
"s3 = boto3.client('s3', **creds)\n",
"for i in range(len(s3_links)):\n",
" uri = s3_links[i][0]\n",
" bucket = uri.split('/')[2]\n",
" obj = '/'.join(uri.split('/')[3:])\n",
" fn = uri.split('/')[-1]\n",
" with open(fn, 'wb') as f:\n",
" s3.download_fileobj(bucket, obj, f)"
]
},
{
"cell_type": "markdown",
"id": "9b86e8e1",
"metadata": {},
"source": [
"#### Using intake-stac\n",
"\n",
"Once again, you can use `intake-stac` to directly access each output from Harmony in AWS. Viewing the file structure and plotting the image can be done in a few simple lines when working with the data in-region:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92e969d1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#\n",
"# NOTE: Execution of this cell will only succeed within the AWS us-west-2 region. \n",
"#\n",
"\n",
"for i in range(len(list(cat))):\n",
" da = cat[list(cat)[i]][entries[i].describe()['name']].to_dask()\n",
" display(da)\n",
" da.plot()"
]
}
],
"metadata": {
Expand All @@ -646,7 +372,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.10.9"
},
"vscode": {
"interpreter": {
Expand Down

0 comments on commit e543f96

Please sign in to comment.