-
Notifications
You must be signed in to change notification settings - Fork 1.2k
dvc stage foreach ordered execution #5644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hmm, could the second usecase of rolling window be solved by checkpoints? |
@gabrieljdcoelho Thanks for the detailed explanation of an interesting use case. As you note, these are two different scenarios, so I'm not sure that
Would it work to have 10 rolling window stages and use Thinking more holistically about the problem, could you have a metascript to generate your |
@skshetry Checkpoints are an interesting idea. It raises a few questions:
|
@dberenbaum, we had some discussions regarding this a few months back on #5181. |
When using foreach loops on dvc.yaml, I noticed that executions do not follow the expected order.
However, in my particular case, this is a great issue.
I use foreach loops for two reasons:
Considering that I have 10 rolling window iterations which simulates an online learning environment. In the first one, a model is trained for X iterations and store it. In the second one, I load the generated model and re-train it on new data for X\2 iterations, since the model is already trained and I need only to adapt it to new data, I do not need to train it entirely.
Thus, execution order is mandatory. If DVC starts the foreach loop by the 3rd iteration (for example), it will not fulfill the mentioned requirements.
I store all models, predictions, results, datasets, ... in all iterations and I use DVC pipelines since it makes it easier to both manage all data and run only parts of my pipeline.
It should be noticed that I already have to duplicate my rolling window stage code to all my datasets, since it is not possible to have nested foreach loops in DVC, so I have 5 stages with the same code, only changing the dataset.
If DVC does not follow the foreach order, I'll need to have 50 stages with duplicated code, assuming that I only have 1 model, otherwise, it would be impracticable.
The text was updated successfully, but these errors were encountered: