-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert and publish GPM IMERG dataset to COG #73
Comments
@abarciauskas-bgse Out of curiosity what are the plans for COG layout for the IMERG variables? Will you create multi-band COGs with variables or a host of single band COGs with variable naming conventions? If there is a consideration for generating COGs for large numbers of netCDF files it might be worthwhile to consult with the user community as we’ll be diverging from the commonly accepted CF Conventions https://cfconventions.org/ which most scientific producers and consumers try to adhere to. For a reference example of working with the IMERG data here is the recipe we developed for Another consideration is the update strategy. We are still considering our incremental append strategy for |
@sharkinsspatial these are all good questions.
In general, I want to centralize questions and answers about generating cloud-optimized (analysis-ready?) data. So far @wildintellect has helped start these documents: I would be interested to know what you @sharkinsspatial think about the layout and content so far in those documents. I know there are a lot of resources on COG and Zarr out there, but I think the intention with these documents is to be able to point our stakeholders somewhere when they are looking for guidance in creating COGs or Zarr. |
@abarciauskas-bgse Current codebase is here as we sketch this out: https://github.com/developmentseed/raster-uploader/
Current API Location: raster-uploader-prod-1759918000.us-east-1.elb.amazonaws.com |
@ingalls got this working today (I believe, still looking at the result and making sure it looks correct) so will generate a few samples tomorrow to send to the ADC team |
@ingalls can you share the IMERG COG output you generated with raster-uploader, along with what was the source NetCDF and the config you used to generate it? I want to compare it with the one I produced and previously shared with the ADC team. |
@abarciauskas-bgse The general directory can be found here:
The input file exists here:
And the
I just grabbed a random IMERG dataset to use for testing. Would be happy to get some time on the caledar and run through your process vs mine with the same input file. Alternatively happy to do it async if you can provide an input file that you used to make sure we have parity. |
I'm going to probably try this myself but did you generate this before or after you added the flipping option? When I compare it the sample i created it makes me think one is flipped and one is not but it could depend on the source. comparing the one i generated: with the one linked above: (locally using rio viz) For reference, I think the file you generated was using https://gpm1.gesdisc.eosdis.nasa.gov/data/GPM_L3/GPM_3IMERGHHE.06/2022/167/3B-HHR-E.MS.MRG.3IMERG.20220616-S000000-E002959.0000.V06C.HDF5 by gdalinfo'ing the netCDF file |
Just also noting some of the conversation from email and slack:
|
There's another https way to access IMERG that does not use EDL which we used in the Pangeo-Forge recipe (@sharkinsspatial and I wrote). I believe this access method might allow for fsspec (or s3fs) access to the files without pre-download. |
Here's the bulk access instructions https://gpm.nasa.gov/sites/default/files/2021-01/arthurhouhttps_retrieval.pdf |
I picked this up again and started the work deploying and testing it, and everything is going smoothly, kudos @slesaad @xhagrg for the veda-data-pipelines refactor. Work is in https://github.com/NASA-IMPACT/veda-data-pipelines/tree/ab/deploy-for-imerg Work to go:
|
I uploaded around 50 COG samples to @abarciauskas-bgse can we send this to Owen and George for review? |
Thanks @smohiudd - sorry if this wasn't clear but we should put them in s3://veda-data-store-staging before sending it to them so they ensure they can access the files when they are in an "official" "staging" (though it should eventually be in s3://veda-data-store) bucket. |
The GPM IMERG data is also available as Zarr - does not help us for viz, but is relevant to include in our catalog anyways. |
Stale |
Epic
None, but to support the ArcGIS Enterprise in the Cloud Effort
Description
Convert the half-hour product to COG for use by ADC initiative
Background
Brian Tisdale who is leading the ArcGIS Enterprise in the Cloud effort reached out on slack:
I sent Brian an email message:
If I understand correctly, to support the ADC (or is it a different acronym now "ArcGIS Enterprise in the Cloud"?) we want to:
GPM IMERG is a high value first example of executing the above steps but there will be many other datasets to follow a similar model to the above.
Acceptance Criteria:
The text was updated successfully, but these errors were encountered: