Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to load workflows structured as packages. #208

Closed
wlandau opened this issue Jan 31, 2018 · 13 comments
Closed

Make it easier to load workflows structured as packages. #208

wlandau opened this issue Jan 31, 2018 · 13 comments

Comments

@wlandau
Copy link
Member

wlandau commented Jan 31, 2018

From #30, environments populated with pkgload::load_all() look like package namespaces, and that can cause problems with drake workflows structured as packages. From pkgload::load_code(), it looks devtools:::source_many() would do most of the work. Thanks for the tip, @hadley!

@dapperjapper For your workflows structured as packages, is there anything else you want to load? We could try to add on more components of pkgload::load_all().

We should also extend this part of the caution vignette.

@violetcereza
Copy link
Contributor

violetcereza commented Jan 31, 2018

I don't really understand the proposed solution, but I can clarify exactly how we're using drake within the package structure.

R/ contains the scripts that create and process data, as well as a script R/drake_plan.R which contains a function that returns a drake plan object.

While we are working on the scripts in R/, we repeatedly hit devtools::load_all() to iterate and test the newest version of the code on our temporary testing data in the global environment.

When we are ready to run our code for real and save our results, we run make(drake_plan()) from the console after (clunkily) managing the environment weirdness created by having our functions in fake package environment.

The final product of this research project is an Rmd, so we also include a call to make() in a chunk in the beginning of the Rmd (while dealing with some working directory shenanigans). This is run silently every time the Rmd is knit, ensuring that it always has the latest data.

@violetcereza
Copy link
Contributor

Looking at the content of pkgload::load_all, we don't currently use package data so that can be left out for now. However, it would be nice if drake could be cognizant of NAMESPACE and use that to choose libraries to load.

@wlandau
Copy link
Member Author

wlandau commented Feb 1, 2018

If I understand correctly, you want to

  1. load all the scripts in R/.
  2. Make imported functionality available from any packages mentioned in NAMESPACE.

Is there anything else? We might be able to combine devtools:::source_many() with either loadNamespace() or attachNamespace() to accomplish this. I could see a function like load_pkg_workflow() becoming part of drake.

@wlandau
Copy link
Member Author

wlandau commented Feb 5, 2018

Thinking about this some more, I think a better solution would be to concentrate on what happens after the package is loaded. Suppose we're using eply as the package that implements our workflow.

library(eply) # could be devtools::load_all()

At this point, the usual behavior is the following.

my_plan <- drake_plan(a = eply(1234)) # doesn't actually work, but good for demonstration here
config <- drake_config(my_plan)
vis_drake_graph(config)

graph1

As with this comment from #227, drake finds nested functions inside the functions YOU define. However, it stops at packages in order to make projects less brittle (see [packrat](https://rstudio.github.io/packrat/ for a better way to depend on packages). But in this case, we care what's happening inside eply(). So we dive in.

dive_into_package <- function(
  package,
  character_only = FALSE,
  envir = parent.frame(),
  jobs = 1
){
  if (!character_only){
    package <- as.character(substitute(package))
  }
  pkg_env <- getNamespace(package) %>%
    as.list %>%
    list2env(parent = globalenv())
  lightly_parallelize(
    X = ls(pkg_env),
    FUN = function(name){
      assign(
        x = name,
        envir = envir,
        value = `environment<-`(get(name, envir = pkg_env), pkg_env)
      )
    },
    jobs = jobs
  )
  envir
}

my_plan <- drake_plan(a = eply(1234))
config <- drake_config(my_plan)
vis_drake_graph(config)

graph2

I would like to incorporate dive_into_package() into drake itself. It will need a better name, and the testing will need to rely on usethis::create_package() and some manual file manipulation, but I think it would help.

Related: #206. cc @lbusett.

@wlandau
Copy link
Member Author

wlandau commented Feb 5, 2018

@dapperjapper in the several months that you have been using the hack, have you encountered any lexical scoping problems?

Some alternative names:

  • use_pkg_functions()
  • attach_pkg_contents()
  • deep_pkg_dependency()

@wlandau
Copy link
Member Author

wlandau commented Feb 6, 2018

expose_pkg_envir() seems like a better name. Or even just expose_package().

@wlandau
Copy link
Member Author

wlandau commented Feb 6, 2018

Even better: extract_imports().

wlandau-lilly added a commit that referenced this issue Feb 7, 2018
Just need to add mentions in the appropriate vignettes.
@wlandau wlandau closed this as completed in 5023308 Feb 7, 2018
@wlandau
Copy link
Member Author

wlandau commented Feb 8, 2018

@dapperjapper, thanks again for the original code that made the solution possible. The function is called expose_imports(). @lbusett, I think it takes care of one of the hardest parts of #206.

@violetcereza
Copy link
Contributor

Looks good! I think we need an example of how to use this within a package that is loaded with devtools::load_all(), not just an external package that is exposed. I can provide this when I get a chance to try it on my own projects...

From the example, it looks like expose_imports(digest) modifies the specified package in place, rather than just returning a modified environment that can be passed to make(). Can you verify whether this is the case? I would prefer that drake leave package namespaces as immutable, because I'm guessing modifying an attached package could introduce some really weird bugs.

@violetcereza
Copy link
Contributor

Side note: have you studied the quosure construct at all? I'm not sure if it would help with any of the complexity of managing drake dependencies across scopes, but I think it is a useful way to think about things.

I've been thinking of making an experimental drake fork that attempts to simply the API and adapt a more rlang idiomatic approach to defining and managing workplans.

@wlandau
Copy link
Member Author

wlandau commented Feb 13, 2018

Thanks, @dapperjapper. Almost all of my focus is on #227, so I would really appreciate help.

We could deep copy the package environment without modifying the original one. Somehow I missed the distinction.

I'm just starting to get into tidy evaluation (see #200). I am not fluent yet, but I intend to be one day. From what little I know about quosures, they sound like a great way to manage scoping. This will definitely come into play for #233 and especially #247.

@wlandau
Copy link
Member Author

wlandau commented Feb 15, 2018

Getting back to your question on which environments are modified: expose_imports() does not actually modify the environment of the package, only the environment in the envir argument. Its first action is to make a deep copy of the package environment, and then it works on that deep copy and the destination envir you supply.

@wlandau
Copy link
Member Author

wlandau commented Dec 2, 2020

I just added new package-tracking functionality in targets tar_option_set(imports = your_packages) (ropensci/targets#239, ropensci/targets#241). See https://wlandau.github.io/targets-manual/practice.html#packages for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants