Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file_in()/file_out() directories #795

Merged
merged 11 commits into from
Mar 22, 2019
Merged

file_in()/file_out() directories #795

merged 11 commits into from
Mar 22, 2019

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Mar 22, 2019

Summary

In this PR, file_in() and file_out() can now handle entire directories, e.g. file_in("your_folder_of_input_data_files") and file_out("directory_with_a_bunch_of_output_files").

Internal conventions:

  • Hashes: the hash of a directory is a hash of the hashes of all the constituent non-directory files.
  • Timestamps: the timestamp of a directory is the max of all the timestamps of the constituent non-directory files. Brittle, but that's okay since we only use timestamps to decide whether to even bother checking hashes. It's just a performance shortcut.

Otherwise, drake assumes directories are irreducible units of data. A target with file_in("dir/file") will not necessarily be connected to another target with a file_out("dir").

library(drake)
  
# good
plan <- drake_plan(
  A = file_out("dir"),
  B = file_in("dir")
)
config <- drake_config(plan)
vis_drake_graph(config)

# bad
plan <- drake_plan(
  A = file_out("dir"),
  B = file_in("dir/file_produced_by_A")
)
config <- drake_config(plan)
vis_drake_graph(config)

Created on 2019-03-22 by the reprex package (v0.2.1)

The potential performance and complexity penalties of the "bad" case above do not seem worth accommodating the use case.

Related GitHub issues and pull requests

Checklist

  • I have read drake's code of conduct, and I agree to follow its rules.
  • I have listed any substantial changes in the development news.
  • I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
  • I have tested this pull request locally with devtools::check()
  • This pull request is ready for review.
  • I think this pull request is ready to merge.

@codecov-io
Copy link

codecov-io commented Mar 22, 2019

Codecov Report

Merging #795 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #795   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          73     73           
  Lines        6217   6263   +46     
=====================================
+ Hits         6217   6263   +46
Impacted Files Coverage Δ
R/api-plan.R 100% <ø> (ø) ⬆️
R/utils-utils.R 100% <ø> (ø) ⬆️
R/exec-meta.R 100% <100%> (ø) ⬆️
R/utils-checksums.R 100% <100%> (ø) ⬆️
R/exec-store.R 100% <100%> (ø) ⬆️
R/api-clean.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1fceb8...3461fd7. Read the comment docs.

@wlandau wlandau changed the title file_in() and file_out() directories file_in()/file_out() directories Mar 22, 2019
@wlandau
Copy link
Member Author

wlandau commented Mar 22, 2019

cc @billdenney.

@billdenney
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants