-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auxiliary coordinates created by HybridPressureFactory have huge chunks #5457
Comments
I looked into fixing this, but would like to get some input on what the desired chunks are. Input anyone? @SciTools/iris-devs Some options:
|
@SciTools/peloton We're concerned that this change might detriment other workflows which were otherwise fine. 3.8, which is due to be released soon (#5363), includes a CHUNK_CONTROL context manager (https://scitools-iris.readthedocs.io/en/latest/further_topics/netcdf_io.html#chunk-control) which, when used to chunk the original coordinates, should help avoid this issue. |
Are you referring to the example implementation in #5712? An alternative could be to assign chunks to the input coordinates of the derivation at load time, such that the derived variable ends up with reasonably sized chunks. The input coordinates then have rather small chunks, which will be inconvenient for anyone who wants to work with those directly, but maybe that is not a very common scenario so not really an issue. I prefer to fix this issue on the Iris side and not leave it to the user, as the current behaviour is unlikely to produce working results. |
When loading a file that contains an auxiliary coordinate that can be computed using a formula term, the auxiliary coordinate ends up having huge chunks. This leads to memory issues when trying to use such coordinates, as the Dask workers will run out of memory and get killed.
Example
Open the file clw_Amon_FGOALS-f3-L_historical_r1i1p1f1_gr_196001-196912.nc and list the chunks of the computed coordinate:
i.e. no chunking is applied along the time dimension at all and the
'air_pressure'
coordinate has a chunk size of 1.5 GB. For performance it would be best if the coordinate had the same chunks as the data of the cube.ncdump -hs
of the file:The text was updated successfully, but these errors were encountered: