Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using xarrays in update step #3

Closed
Daafip opened this issue May 3, 2024 · 10 comments
Closed

Issues using xarrays in update step #3

Daafip opened this issue May 3, 2024 · 10 comments

Comments

@Daafip
Copy link

Daafip commented May 3, 2024

When using this template as a starting point to implement my own model in eWaterCycle I followed this convention:

self.precipitation.isel(time=self.current_timestep).to_numpy()

For very large dataset, i understand this as you don't want to load everything into memory.
However for smaller (conceptual) models, from my testing the repeated calling of xarray causes slower code and even crashes:

it would get stuck on self.P_dt = self.P.isel(time=self.current_timestep).to_numpy() * self.dt and self.EP_dt = self.EP.isel(time=self.current_timestep).to_numpy() * self.dt

Not sure on the best way to tackle this? a short comment or somthing like that?

@BSchilperoort
Copy link
Member

BSchilperoort commented May 3, 2024

I guess doing this every time step would indeed slow down the model. It also depends on if the DataArray consists of dask arrays, or already numpy arrays. I would not expect much of an overhead if the data is already in memory (of course more than just an array slice, but I did not write the bmi example to be extremely efficient).

Do note that if the data is loaded lazily (dask arrays), that this line will cause the file to be accessed.
On some occasions xarrays locks file access so perhaps that's related to your problem... See this discussion on how to open a dataset without locking the file.

If a dask worker/thread dies it isn't as easy to debug the issue as you don't get a clear stacktrace.

@BSchilperoort
Copy link
Member

BSchilperoort commented May 3, 2024

Try changing this line:
https://github.com/Daafip/HBV-bmi/blob/db437a53c7125aab5616a11a1e60b7dbfbad1e2d/src/HBV/utils.py#L38

To have this structure, it'll avoid the resource being locked:

with xr.open_dataset('test.nc') as ds:
    ds.load()

Edit: this is actually implemented as xarray.load_dataset

@Daafip
Copy link
Author

Daafip commented May 7, 2024

Even then, I'm still having stability issues (on huge runs). Ill give with dask.config.set(scheduler="synchronous"): ds.load() to fully disable dask...

@Daafip
Copy link
Author

Daafip commented May 7, 2024

For context, I'm calling model.update() 600 times on different model instances but the same forcing (file) shortly after each other.
Likely this is the root of the issue that xarray wasn't designed for that...

@BSchilperoort
Copy link
Member

Does the same issue still occur if you give each model its own forcing file? That could be a way to try to debug this.

Or first do the initialize step sequentially, giving each model sufficient time to load the files before initializing the next model. Then the update step does not have any file IO.

If that works OK, I would think it's probably related to file access being locked somehow. Perhaps you can work around that with locks https://distributed.dask.org/en/latest/api.html#distributed.Lock

@BSchilperoort
Copy link
Member

Another thing you could try is the h5netcdf backend/engine. Perhaps the problem is in the underlying netcdf-c library (default engine for xarray).

@BSchilperoort
Copy link
Member

BSchilperoort commented May 7, 2024

There is also the lock argument of xr.open_dataset...

Seems to always lock on the NETCDF4 backend if you have the default lock=None:
https://github.com/pydata/xarray/blob/2ad98b132cf004d908a77c40a7dc7adbd792f668/xarray/backends/netCDF4_.py#L390-L401

@Daafip
Copy link
Author

Daafip commented May 7, 2024

locked somehow

The aquire pops up often on my tracebacks so I think this is the issue

give each model its own forcing file

Will give this a try.

@Daafip
Copy link
Author

Daafip commented May 8, 2024

If that works OK, I would think it's probably related to file access being locked somehow. Perhaps you can work around that with locks https://distributed.dask.org/en/latest/api.html#distributed.Lock

I think its this, when testing with multiple forcing files it works faster and (fingers crossed) doesn't seem to crash.
That its only an issue now shows that it only starts to become a problem when really hitting it hard

@BSchilperoort
Copy link
Member

I think that in general it would be better to have separate forcing objects for separate models. We can continue the discussion here: Daafip/eWaterCycle-DA#35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants