Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to retrieve the underlying store #5175

Closed
skorper opened this issue Apr 16, 2021 · 6 comments
Closed

Feature to retrieve the underlying store #5175

skorper opened this issue Apr 16, 2021 · 6 comments

Comments

@skorper
Copy link

skorper commented Apr 16, 2021

Is your feature request related to a problem? Please describe.
It would be useful if I could retrieve the underlying store from an xarray Dataset object. There was previously a workaround that allowed this, but that was recently removed.

dataset._file_obj.ds  # Doesn't work anymore :(

Describe the solution you'd like
Ideally I'd like the API to be extended so the underlying store can be retrieved. Something like:

>>> dataset = xr.open_dataset('/path/to/file.nc')
>>> nc_dataset = dataset.store
>>> type(nc_dataset)
<class 'netCDF4._netCDF4.Dataset'>

Describe alternatives you've considered
I'd be fine using the old workaround if that was still an option. If anyone knows of a different workaround I would be fine with that, but a better long-term solution would be an actual user-facing function for accessing the store.

I might be using the 'store' terminology incorrectly, but hopefully my request is clear with the above example.

@kmuehlbauer
Copy link
Contributor

AFAICT you could try to use ds._close to get the closing function(s) of the underlying file managers. Not sure if you then can lookup their parent-store(s) .

But you could open the store yourself, keep the reference and load it into a Dataset. Not sure if this would work for your use case, though.

@kmuehlbauer
Copy link
Contributor

@skorper So, the workaround would be something along the lines:

>>> dataset = xr.open_dataset('/path/to/file.nc')
>>> nc_dataset = dataset._close.__self__.ds
>>> type(nc_dataset)
<class 'netCDF4._netCDF4.Dataset'>

But note, this only works if the dataset was opened/created from a single source file. And I'm not sure, if this is wanted behaviour.

@alexamici can possibly answer your question, if such API would be possible. To my understanding the backend refactor also did a great deal to disentangle Datasets from the underlying data sources. Please correct me @alexamici, if I'm wrong.

@skorper
Copy link
Author

skorper commented Apr 19, 2021

Thanks @kmuehlbauer! That works great for now.

@alexamici
Copy link
Collaborator

@kmuehlbauer precisely. For example the AbstractDataStore is not part of the API anymore and it is kept only for backward compatibility.

What I see missing is a way defined at API level for the backends to "attach" arbitrary information and possibly code to the backend Datasets. cc @shoyer @jhamman

@alexamici alexamici mentioned this issue Apr 30, 2021
13 tasks
@shoyer
Copy link
Member

shoyer commented May 2, 2021

@skorper Could you kindly explain your use-case here? Why it is useful for you to retrieve underlying stores? :)

This has never been part of Xarray's supported public API, and unless that changes the present work-around could break again in the future without warning.

@kmuehlbauer
Copy link
Contributor

It looks like that this is not going to be implemented any time soon.

But if you rely on access to the underlying store you might open the file using netCDF4-python/NetCDF4DataStore.

import netCDF4 as nc
import xarray as xr
with nc.Dataset(filename) as nds:
    store = xr.backends.NetCDF4DataStore(nds)
    with xr.open_dataset(store) as ds:
        print(ds)

Please feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants