-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify data drivers #720
Comments
Posting #432 here for reference discussions |
Look at this gtsm_codec_reanalysis_{freq}_v1:
crs: 4326
data_type: GeoDataset
driver: netcdf
kwargs:
chunks:
stations: 10
time: -1
meta:
category: ocean
paper_doi: 10.3389/fmars.2020.00263
paper_ref: Muis at al (2020)
source_license: https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf
source_url: https://doi.org/10.24381/cds.8c59054f
source_version: v1
path: p:/11205028-c3s_435/01_data/01_Timeseries/timeseries2/{variable}/reanalysis_{variable}_{freq}_{year}_{month:02d}_v1.nc
placeholders:
freq: [10min, hourly, dailymax]
rename:
station_x_coordinate: lon
station_y_coordinate: lat
stations: index There is a placeholder in the title entry, which can easily be expanded using the |
Let me try to answer: cmip6_{model}_historical_{member}_{timestep}:
crs: 4326
data_type: RasterDataset
driver: zarr
filesystem: gcs
kwargs:
drop_variables: [time_bnds, lat_bnds, lon_bnds, bnds]
decode_times: true
preprocess: harmonise_dims
consolidated: true
meta:
category: climate
paper_doi: 10.1175/BAMS-D-11-00094.1
paper_ref: Taylor et al. 2012
source_license: CC BY 4.0
source_url: https://console.cloud.google.com/marketplace/details/noaa-public/cmip6?_ga=2.136097265.-1784288694.1541379221&pli=1
source_version: 1.3.1
placeholders:
model: [IPSL/IPSL-CM6A-LR, SNU/SAM0-UNICON, NCAR/CESM2, NCAR/CESM2-WACCM, INM/INM-CM4-8, INM/INM-CM5-0, NOAA-GFDL/GFDL-ESM4, NCC/NorESM2-LM, NIMS-KMA/KACE-1-0-G,
CAS/FGOALS-f3-L, CSIRO-ARCCSS/ACCESS-CM2, NCC/NorESM2-MM, CSIRO/ACCESS-ESM1-5, NCAR/CESM2-WACCM-FV2, NCAR/CESM2-FV2, CMCC/CMCC-CM2-SR5, AS-RCEC/TaiESM1,
NCC/NorCPM1, IPSL/IPSL-CM5A2-INCA, CMCC/CMCC-CM2-HR4, CMCC/CMCC-ESM2, IPSL/IPSL-CM6A-LR-INCA, E3SM-Project/E3SM-1-0]
member: [r1i1p1f1]
timestep: [day, Amon]
path: gs://cmip6/CMIP6/CMIP/{model}/historical/{member}/{timestep}/{variable}/*/*
rename:
pr: precip
tas: temp
rsds: kin
psl: press_msl
unit_add:
temp: -273.15
unit_mult:
precip: 86400
press_msl: 0.01 So The rest are "known" keywords in the path that hydromt can use to directly slice data when reading a data source. For example in some get_data methods you can pass But then like Maybe one final example to try and understand the difference between # Placeholders have to be replaced in the data source name to get the data and keywords can be passed in the get_data ethods arguments
data_catalog.get_geodataset("gtsm_codec_reanalysis_hourly_v1", variables = ["precip"], time_tuple=("2010-01-01", "2010-03-31"))
# Get the 10min version of the dataset instead for all times and variables
data_catalog.get_geodataset("gtsm_codec_reanalysis_10min_v1") |
In addition to @hboisgon. The placeholders are solved when parsing the data catalog, The path format arguments are checked in the resolve path and should be part of the new We can discuss whether the placeholder architecture can be replaced by an extended implementation of the |
I was wondering the same if we could replace |
So far I intend to place a generic solution with hydromt keywords |
I think there is definitely something to this idea, but I think it would be good to have a (short) design session around this. One thing I think is definitely something we want is to make a distinction between the kinds of place holders since they need to be handled at different times, if I understand correctly. I'm not sure what the correct terminology is, but for now I'll call them data-slice place holders (var=precip) and file path place holders (year/month/feq). One thing I personally find annoying about the current place holder implementation is that it doesn't communicate possible values, such as year or variable in the first example. additionally, especially when dealing with cloud file systems, any processing we can do up front without having to ask the fs for information is going to speed up the process, so if possible I'm in favour of that. So I'm definitely in favour of looking further into using the variants. |
My thinking to implement the generic resolve path solution at the
The double |
Just to clarify the discussion. We have HydroMT Next to this we have The concepts of |
I think this is resolved with the current driver implementation in v1 |
Kind of request
Currently,
DataAdapter
s are responsible for both the representation of different data sources in theDataCatalog
, reading in the data and transforming the data to a uniform data representation in memory. This makes the class responsible for a lot of functions and hard to modify or extend by the plugins.Enhancement Description
We propose that a
Driver
should be responsible for reading the data and creating a memory representation, while theAdapter
should do generic transformations and filtering/slicing. ADataSource
should represent items in theDataCatalog
, which can check at read time whether all the required fields are present.Use case
This should make testing and maintenance easier, while being more flexible to customize for plugins.
Additional Context
No response
The text was updated successfully, but these errors were encountered: