-
Notifications
You must be signed in to change notification settings - Fork 1.2k
How to manage repetitive dvc run
commands (like unpacking of many zip files)?
#1119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @sotte ! Very interesting scenario! We have been thinking about introducing a build matrix #1018 to handle a bit different scenario, but i think in your case it could be useful as well. For example, Dvcfile placed in
Would something like that suit you? Please feel free to share any thoughts or suggestions. Thanks, |
That looks quite nice!
But I guess the deps and outs have to be specified explicitly. |
Glad you like it! We will take a closer look at implementing this in the near future. Yeah, the wildcards raise more questions than answers. I can't promise that they are going to be implemented this way, but we will sure think about it to see if there are some suitable ways we could incorporate them in the future. Thanks, |
Though without wildcards or something similar to them, we still leave the pain point of having to explicitly specify deps and outs. Need to definitely think about optimizing this as well. |
True, but if you add the wildcard feature, then people (including myself ;)) would want more data pipeline features. How about rewrite rules? Parallel execution? Feature parity to makefile...at least ;) Do you want to go down the route? I'm not saying that you should or should not. Just that it's a tricky balance. |
Well, we are already sorta going that way with build matrix and upcoming parallel execution, so we might(and I think we should) as well make another step 🙂 Btw, what do you mean by "rewrite rules" ? |
Wrong term, sorry. I was thinking of substitutions in makefiles. See https://www.gnu.org/software/make/manual/make.html#Substitution-Refs |
Ah, got it. Yes, i think it might be necessary as well in the long term. |
Btw, @shcheklein noticed that we had a similar discussion recently https://discuss.dvc.org/t/creating-an-aggregate-dvc-file/93 and thought that it might be worth mentioning that you can workaround this by using a bash script just like in the linked discussion. |
That could work. It feels like a dirty workaround though and not very intuitive. I'll try it out and let you know how it works out in the long run. |
Assume you download a bunch of zips.
Now you need to unpack the files to lets say
data/unpacked/file{i}
. Something like this:When I use a Makefile to manage my pipeline I would write a simple rule for unpacking the zips and put them into their target folders. What is the best approach to do something like this with DVC? Right now I just type a bunch of
dvc run
commands but that does not scale well.The text was updated successfully, but these errors were encountered: