Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to choose source of global attributes in MFDataset. #835

Closed
juseg opened this issue Aug 22, 2018 · 6 comments
Closed

Add option to choose source of global attributes in MFDataset. #835

juseg opened this issue Aug 22, 2018 · 6 comments

Comments

@juseg
Copy link

juseg commented Aug 22, 2018

Just a suggestion... Could we add a keyword argument on MFDataset to choose the index (or name) of the file to be used as a master file? Or, the file to pick global attributes from?

I typically use MFDatasets to combine model output from consecutive model runs. When creating an MFDataset instance history is taken from the first run and all subsequent history is lost.

master = files[0]

Thanks for considering!

@jswhit
Copy link
Collaborator

jswhit commented Aug 23, 2018

added master_file kwarg arg in pull request #837

@jswhit
Copy link
Collaborator

jswhit commented Aug 23, 2018

@juseg, could you please test the pull request (branch master_file) and let me know if it works for your use case?

@juseg
Copy link
Author

juseg commented Aug 24, 2018

@jswhit Thanks very much! It works, but the time coordinate is reordered (starting with the master file, and then all other files in order). Potentially this could be solved by either re-ordering subdatasets by filenames, or adding a separate attr_file keyword argument or so.

PS. I've noticed that the current behaviour of netCDF4 (using global attributes from the first file) is similar to other tools such as ncrcat and cdo cat. I realize this is a pure metadata issue but at least in the context of model simulation chains, apparently concatenation systematically discards history.

@jswhit
Copy link
Collaborator

jswhit commented Aug 24, 2018

@juseg - what do you mean that the time coordinate is re-ordered? If you are aggregating along the time dimension the order of the time coordinate should be set by the order of the files in the files input list. Which file you choose as the master file should not affect this. If it does, that's a bug. It doesn't in my simple test. The master file is only used to define attributes, the variable names and aggregation dimension.

@juseg
Copy link
Author

juseg commented Aug 27, 2018

@jswhit Sorry that was a mistake, things work exactly as they should. I think we can close this issue.

import numpy as np
import netCDF4 as nc

# prepare fake data
time = np.arange(1000)
data = time**2

# write to multiple netcdf files
for i in range(10):
    filename = 'ds' + str(i) + '.nc'
    ds = nc.Dataset(filename, 'w', format='NETCDF4_CLASSIC')
    ds.createDimension('time', None)
    ds.createVariable('time', 'i4', ('time',))
    ds.createVariable('data', 'i4', ('time',))
    ds.variables['time'][:] = time[100*i:100*i+100]
    ds.variables['data'][:] = data[100*i:100*i+100]
    ds.history = 'Created ' + filename + '.'
    ds.close()

# open as mfdataset and assert time is increasing
ds = nc.MFDataset('ds?.nc', master_file='ds9.nc')
assert ds.history == 'Created ds9.nc.'
assert np.all(np.diff(ds.variables['time'][:]) > 0)
ds.close()

Many thanks!

@juseg juseg closed this as completed Aug 27, 2018
@juseg
Copy link
Author

juseg commented Aug 27, 2018

@jswhit I mean after merging the master_file branch... Sorry for the mess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants