Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reset_encoding to Dataset and DataArray objects #7686

Closed
jhamman opened this issue Mar 27, 2023 · 2 comments · Fixed by #7689
Closed

Add reset_encoding to Dataset and DataArray objects #7686

jhamman opened this issue Mar 27, 2023 · 2 comments · Fixed by #7689

Comments

@jhamman
Copy link
Member

jhamman commented Mar 27, 2023

Is your feature request related to a problem?

Xarray maintains the encoding of datasets read from most of its supported backend formats (e.g. NetCDF, Zarr, etc.). This is very useful when you want to perfectly roundtrip but it often gets in the way, causing conflicts when writing a modified dataset or when appending to another dataset. Most of the time, the solution is to just remove the encoding from the dataset and continue on. The following code sample is found in a number of issues that reference this problem.

    for v in list(ds.coords.keys()):
        if ds.coords[v].dtype == object:
            ds[v].encoding.clear()

    for v in list(ds.variables.keys()):
        if ds[v].dtype == object:
            ds[v].encoding.clear()

A sample of issues that show variants of this problem.

Describe the solution you'd like

In many cases, the solution to these problems is to leave the original dataset encoding behind and either use Xarray's default encoding (or the backends default) or to specify one's own encoding options. Both cases would benefit from a convenience method to reset the original encoding. Something like would serve this process:

ds = xr.open_dataset(...).reset_encoding()

Describe alternatives you've considered

Variations on the API above could also be considered:

xr.open_dataset(..., keep_encoding=False)

or even:

with xr.set_options(keep_encoding=False):
    ds = xr.open_dataset(...)

We can/should also do a better job of surfacing inconsistent encoding in our backends (e.g. to_netcdf).

Additional context

No response

@keewis
Copy link
Collaborator

keewis commented Mar 27, 2023

see also #6323 and #4817

@jhamman
Copy link
Member Author

jhamman commented Mar 27, 2023

As I said in #4817, I think we should pursue the keep_encoding option (in addition to the proposal in this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants