Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to add a new gridded dataset to ogh #32

Open
ChristinaB opened this issue Sep 21, 2018 · 6 comments
Open

how to add a new gridded dataset to ogh #32

ChristinaB opened this issue Sep 21, 2018 · 6 comments

Comments

@ChristinaB
Copy link
Contributor

ChristinaB commented Sep 21, 2018

  1. Fork Observatory
  2. Log in to CUAHSI JupyterHub server to run from HydroShare
  3. Open a terminal and

git clone http://yourfork

into a sensible folder outside of the HS folder structure (e.g. make a folder called Github)
4. copy ogh.py and ogh_meta to your HS working directory
5. test existing Notebook and functions for case study location
6. change the name and Notebook to import and use a local version of ogh and ogh meta
7. Create functions for metadata, get, and compile (A, B, C below)
8. Test and debug
9. Download your data !! Yeahhh. Explore your data with other OGH functions.
10. Click Pull request https://github.com/Freshwater-Initiative/Observatory

Three main code additions:
A. Edit ogh_meta (click here to view code) for your new dataset
B. Create a new ogh get function for your new dataset. For example, create your own version of this:

def getDailyMET_livneh2013(homedir, mappingfile,
subdir='livneh2013/Daily_MET_1915_2011/raw',
catalog_label='dailymet_livneh2013'):
"""
Get the Livneh el al., 2013 Daily Meteorology files of interest using the reference mapping file

homedir: (dir) the home directory to be used for establishing subdirectories
mappingfile: (dir) the file path to the mappingfile, which contains the LAT, LONG_, and ELEV coordinates of interest
subdir: (dir) the subdirectory to be established under homedir
catalog_label: (str) the preferred name for the series of catalogged filepaths
"""
# check and generate DailyMET livneh 2013 data directory
filedir=os.path.join(homedir, subdir)
ensure_dir(filedir)

# generate table of lats and long coordinates
maptable = pd.read_csv(mappingfile)

# compile the longitude and latitude points
locations = compile_dailyMET_Livneh2013_locations(maptable)

# Download the files
ftp_download_p(locations)

# update the mappingfile with the file catalog
addCatalogToMap(outfilepath=mappingfile, maptable=maptable, folderpath=filedir, catalog_label=catalog_label)

# return to the home directory
os.chdir(homedir)
return(filedir)

C. Create new compile function for your dataset

def compile_bc_Livneh2013_locations(maptable):
"""
Compile a list of file URLs for bias corrected Livneh et al. 2013 (CIG)

maptable: (dataframe) a dataframe that contains the FID, LAT, LONG_, and ELEV for each interpolated data file
"""
locations=[]
for ind, row in maptable.iterrows():
    basename='_'.join(['data', str(row['LAT']), str(row['LONG_'])])
    url=['http://cses.washington.edu/rocinante/Livneh/bcLivneh_WWA_2013/forcings_ascii/', basename]
    locations.append(''.join(url))
return(locations)

supp_table1.pdf

@jphuong
Copy link
Contributor

jphuong commented Sep 21, 2018

@ChristinaB

I like the moxie in the instructions. However, the key step is forgotten, Step 0: File and Metadata Management. You're including a new data set, and they may have their own ways of doing things. This includes the time-period of the files, organization of the files, gridding schema, and variables represented. Working backwards from the functions is the hard way forward.

@ChristinaB
Copy link
Contributor Author

Maybe I should have made Gist. But above there are "Three main code additions:
A. Edit ogh_meta (click here to view code) for your new dataset
and I didn't really get into the details in the github issue because I told him verbally. Also, I noticed that the supp table does not include the functions embedded in the get functions. Would that make it too long? We may want to update that to include everything...

@keckje
Copy link

keckje commented Sep 24, 2018

Jim,

Do you have any time today or this evening I could try modifying oxl(ogh_xarray_landlab.py) to include a oxl.get_x_hourlywrf_pnnl2018 function?

I cloned your fork of the Observatory and ran the Observatory_usecase_7_xmapLandlab and am looking at the oxl set of functions. The usecase_7 has a number of errors when I run it from hydroshare.... like:

capture

Do I need to use the ogh module you updated at geohack to run the notebook?

the pacific northwest national laboratory data is saved at:
http://cses.washington.edu/rocinante/WRF/PNNL_NARR_6km/

Thanks for your help

@jphuong
Copy link
Contributor

jphuong commented Sep 24, 2018

@keckje @ChristinaB

Sorry, I'm at a workshop today, and I won't be able to get around to this error until Wednesday.

In the usecase7 notebook, the intention is to have them use the OGH v0.1.11 conda library (the current most stable version), while using functionalities from the oxl module, which will later become OGH.xarray_landlab module. If you've paired an ogh.py within the same folder as the notebook, just rename it to something else as I have like ogh_old.py.

To make sure you can run ogh v.0.1.11, run the following code in bash on HydroShare Jupyterhub to get through the necessary installations. In the long run, HydroShare-Jupyterhub needs to keep up with these versioning issues in their Docker image.

conda install -c conda-forge ogh fiona ncurses libgdal gdal pygraphviz --yes

@jphuong
Copy link
Contributor

jphuong commented Sep 24, 2018

@keckje @ChristinaB

I've just pushed my changes to my Fork. It should work better if you've implement the conda install code.

@keckje
Copy link

keckje commented Sep 25, 2018

Hi Jim, I added two functions (copied and modified two of your functions so that we can download the PNNL data. Should I push the changes to your fork of Observatory?

def compile_x_wrfpnnl2018_raw_locations(time_increments):
"""
Compile a list of file URLs for PNNL 2018 raw WRF data
time_increments: (list) a list of dates that identify each netcdf file
"""
locations=[]
domain='http://cses.washington.edu'
subdomain='/rocinante/WRF/PNNL_NARR_6km'

for ind, ymd in enumerate(time_increments):
    basename='/' + ymd[0:4] + '/data.' + ymd[0:4] + '-' + ymd[4:6] + '-' + ymd[6:8] + '.nc'
    url='{0}{1}{2}'.format(domain, subdomain, basename)
    locations.append(url)
return(locations)

def get_x_hourlywrf_PNNL2018(homedir,
spatialbounds,
subdir='PNNL2018/Hourly_WRF_1981_2015/noBC',
nworkers=4,
start_date='1981-01-01',
end_date='2015-12-31',
rename_timelatlong_names={'LAT':'LAT','LON':'LON'},
file_prefix='sp_',
replace_file=True):
"""
get hourly WRF data from a 2018 PNNL WRF run using xarray on netcdf files
"""
# check and generate data directory
filedir=os.path.join(homedir, subdir)
ogh.ensure_dir(filedir)
# modify each month between start_date and end_date to year-month
dates = [x.strftime('%Y%m%d') for x in pd.date_range(start=start_date, end=end_date, freq='D')]
# initialize parallel workers
da.set_options(pool=ThreadPool(nworkers))
ProgressBar().register()
# generate the list of files to download
filelist = compile_x_wrfpnnl2018_raw_locations(dates)
# download files of interest
NetCDFs=[]
for url in filelist:
NetCDFs.append(da.delayed(wget_x_download_spSubset)(fileurl=url,
spatialbounds=spatialbounds,
file_prefix=file_prefix,
rename_latlong_names=rename_timelatlong_names,
replace_file=replace_file))
# run operations
outputfiles = da.compute(NetCDFs)[0]
# reset working directory
os.chdir(homedir)
return(outputfiles)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants