Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel model runs: separate forcing files for each model #35

Open
2 tasks
BSchilperoort opened this issue May 8, 2024 · 7 comments
Open
2 tasks

Parallel model runs: separate forcing files for each model #35

BSchilperoort opened this issue May 8, 2024 · 7 comments

Comments

@BSchilperoort
Copy link
Collaborator

BSchilperoort commented May 8, 2024

In eWaterCycle/leakybucket-bmi#3 we debugged an issue related to resource locking when models are trying to access a shared file.
In this specific case it is a model where it's possible to easily change the model code (as you're the creator of the model, @Daafip ), but for a general solution it would be best that models do not share the same resources on disk.

Therefore I propose that the default behavior of ewatercycle-da is to have separate forcing objects for separate models (at least for parallel runs).

Still todo:

  • If a single forcing object is passed, duplicate the forcing so that each ensemble member has its own forcing object.
  • HBVForcing will have to change (long term todo) to accomodate this
@Daafip
Copy link
Owner

Daafip commented May 8, 2024

to not share the same resources on disk

For now i've left this choice up to the user but definitely something to document and make clear.
I'd like to keep providing a single forcing object as a option for initial users who say want to use it for just a quick sensitivity analysis.
- A work around could be to duplicate the forcing object in case just one is provided?

at least for parallel runs

Currently I've implemented parrallel runs only when running models in docker, for local models it seems to introduce too much overhead, but I still encountered the locking issue.

@BSchilperoort
Copy link
Collaborator Author

For now i've left this choice up to the user but definitely something to document and make clear.

I think having it as an option is OK, while by default using separate forcing files.

A work around could be to duplicate the forcing object in case just one is provided?

I would not call this a workaround, but rather the desired functionality.

Currently I've implemented parrallel runs only when running models in docker

Yes I would keep it that way. Local models should only be for developing/testing if the model works as ewatercycle plugin.

@Daafip
Copy link
Owner

Daafip commented May 8, 2024

Should be an easy enough fix, will also ensure works nicely with DA.generate_forcing, which allows for generation of forcinging in parrallel.

I'll discus today how much more dev i'll be doing towards the end of my thesis.

Edit: not much

@Daafip
Copy link
Owner

Daafip commented May 10, 2024

but for a general solution it would be best that models do not share the same resources on disk.

So the issue was with using the same forcing object. But in generating the forcing I also had issues where if I had the same source .txt file containing the timeseries, it would also fail:

...
  File ".../ewatercycle_HBV/model.py", line 77, in _make_cfg_file
    self.forcing.from_camels_txt()
  File "..../ewatercycle_HBV/forcing.py", line 204, in from_camels_txt
    ds = xr.Dataset(data_vars=df,
  File ".../xarray/core/dataset.py", line 694, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File ".../xarray/core/dataset.py", line 423, in merge_data_and_coords
    return merge_core(
...

So even source resources can cause issues it seem...

edit: now tried with adjusting this so there is one txt file per forcing object. This works fine.
From testing on Windows Subsystem Linux (not ideal), I still have a memory issue causing instability.

Edit edit: Seems to be a memory issue on WSL, when runnnig (slower) on surf, this doesn't seem to be an issue. IO is just hit first as it loads it into memory.

@Daafip
Copy link
Owner

Daafip commented May 15, 2024

After more testing, it is a memory issue with WSL. Running bare ubuntu on my machine only used a fraction of the memory WSL consumed (which makes sense).

@Daafip
Copy link
Owner

Daafip commented May 16, 2024

it is a memory issue with WSL

For this reason i'll leave it as is at the moment, but something to still look at. Thus moved the todo points up to your initial comment so it shows nicecly on git (sorry for editing your comment bart).

@BSchilperoort
Copy link
Collaborator Author

it is a memory issue with WSL

Glad that you figured it out!

For this reason i'll leave it as is at the moment, but something to still look at. Thus moved the todo points up to your initial comment so it shows nicecly on git (sorry for editing your comment bart).

No worries haha.

HBVForcing will have to change (long term todo) to accomodate this

We can also add a copy method to the base class (ewatercycle's DefaultForcing). Forcings should be self contained in a directory, and should be movable (only relative paths), so you would just have to copy the directory to a new location.

for i in range(n_ensembles):
    target_dir = forcing.copy(ensemble_forcings_dir / "ensemble_{i}")

which internally would call;

class DefaultForcing:
    ...
    def copy(self, target: Path | str) -> Path:
        return Path(shutil.copytree(src=self.directory, dst=target))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants