You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to compress datasets in an automated way, using a code that, reduced a lot, looks like this:
importxarraydataset=xarray.open_dataset("./dataset.nc")
# This is somehow the main body that is automated and that I want to make it workcomp=dict(
zlib=True,
complevel=4,
contiguous=False,
shuffle=True,
)
keys_to_keep= {
"scale_factor",
"add_offset",
}
encoding= {
name: {
**{
key: valueforkey, valueinvar.encoding.items()
ifkeyinkeys_to_keep
},
**comp,
}
forname, varindataset.data_vars.items()
}
encoding= {var: compforvarindataset.data_vars}
dataset.to_netcdf(
mode="w",
encoding=encoding,
)
Before, I didn't have the keys_to_keep and I just passed on the compression options. But then, it looked like in some cases (specific datasets that had the add_offset and scale_factor) the compression wasn't working (check Issue #9783 ).
I created a middle keys_to_keep with a different set (the one that pops up if you are sending wrong 'encodings' actually to the function):
There was this case, though, were the var.encoding.items() just had the dtype and the _FillValue and the whole output dataset was wrong.
With the code that I have now, keeping the add_offset and the scale_factor seems like all the datasets are good (which also is weird because they didn't appear in the ERROR I got when trying to pass the encoding) but I'm losing a little bit of compression power.
Question: Is there a standard to know which attributes to pass through the encoding? I've seen that this was somehow a big Issue back in the day but I couldn't find anything in the actual documentation to address it now: #1614 .
Question: am I supposed to keep the var.encoding.items() dictionary or a specific set of values (like add_offset, scale_factor and maybe some others?)?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm trying to compress datasets in an automated way, using a code that, reduced a lot, looks like this:
Before, I didn't have the
keys_to_keep
and I just passed on the compression options. But then, it looked like in some cases (specific datasets that had theadd_offset
andscale_factor
) the compression wasn't working (check Issue #9783 ).I created a middle
keys_to_keep
with a different set (the one that pops up if you are sending wrong 'encodings' actually to the function):There was this case, though, were the
var.encoding.items()
just had thedtype
and the_FillValue
and the whole output dataset was wrong.With the code that I have now, keeping the
add_offset
and thescale_factor
seems like all the datasets are good (which also is weird because they didn't appear in the ERROR I got when trying to pass the encoding) but I'm losing a little bit of compression power.Question: Is there a standard to know which attributes to pass through the encoding? I've seen that this was somehow a big Issue back in the day but I couldn't find anything in the actual documentation to address it now: #1614 .
Question: am I supposed to keep the
var.encoding.items()
dictionary or a specific set of values (likeadd_offset
,scale_factor
and maybe some others?)?Thank you very much in advance!
Beta Was this translation helpful? Give feedback.
All reactions