Consider a user-visible build command #157

kendonB · 2017-11-14T20:48:11Z

I'm still trying to improve my workflow with many different calls to make.

Using skip_imports = TRUE, trigger = "missing", graph = my_graph, improves speed somewhat but drake still does a lot of work before getting started on building targets.

I propose a different building command that doesn't bother at all with any checking and builds unconditionally. This would more literally be @wlandau-lilly's proposal to advertise drake as a cool and flexible job scheduler.

It would:

Build the graph for the selected targets only (or the user passes a prebuilt graph which gets pruned)
Build the selected targets unconditionally and just fail if dependencies are missing - the user can choose to build dependencies.
Hash dependencies upon successful building so that drake::make knows those targets are up to date.
Decide what order to build stuff in.

The text was updated successfully, but these errors were encountered:

wlandau-lilly · 2017-11-14T21:16:55Z

I do want drake to have that kind of fast functionality. On the other hand, drake is almost there, so I am resistant to an extreme solution that duplicates major, sensitive parts of the code base. Also, no matter what direction we go, I think we need to understand the remaining bottlenecks first. @kendonB, since I do not have a project as large as yours, I will need your help with that. I recommend working with the internal functions directly, some of which are actually exported. For example, how fast is make_imports() or drake_config() on your workflow? Once you have a config list, how fast is run_parallel_backend()? what about run_parallel(), or even an individual parallel_stage()?

wlandau-lilly · 2017-11-14T21:21:12Z

@kendonB your proposed solution will probably work if meta_list() is still a bottleneck. In that case, it will probably be better to just have a rush argument to make() that forces meta_list() to be skipped entirely. If it isn't, then there is probably some other unknown bottleneck, and we will need an entirely different solution altogether.

wlandau-lilly · 2017-11-14T21:37:31Z

As we're looking into this, it would help to distinguish drake's prep work from future_lapply(). In downsized versions of my old projects, the future-powered SGE example is actually a lot slower than for other forms of parallelism even though all the checking is the same.

wlandau-lilly · 2017-11-14T23:37:15Z

Another point: the graph must contain imports and the imports must be processed for the targets to be up to date. If we go any faster, we may lose reproducibility.

wlandau-lilly · 2017-11-14T23:45:34Z

Another possible bottleneck: next_targets(), which we need for executing targets in the correct order. @kendonB, I believe you pointed that out as a slow point before, and at at the time, we could find a faster replacement. Unfortunately, we will always need it in parallel_stage().

kendonB · 2017-11-14T23:50:56Z

These are the bottlenecks I see when running make on targets that are up to date with skip_imports = TRUE, trigger = "missing", graph = my_graph

drake_config() #7.78 seconds
  # inside drake_config:
  quick_inventory() #: 5 seconds
  check_drake_config() #: 2.275 seconds
  
  # in prepare_distributed:
  attempts <- outdated(config = config, make_imports = !config$skip_imports) # 5 seconds
  #in run_parallel
  quick_inventory() #: 5 seconds

wlandau-lilly · 2017-11-14T23:56:26Z

Useful to know, thanks @kendonB!

~~The inventories actually make things faster because they avoid repeated calls to cache$list() for storr caches. Nothing more we can do there.~~

I will look into the call to outdated(), but I think it's also necessary. For usual make()'s, it is actually responsible for making the imports.

wlandau-lilly · 2017-11-15T01:26:51Z

On second thought, I probably can speed this up a bit. We can borrow inventories across storr namespaces, and we can add an option to skip the safety checks. I will give it a try.

wlandau-lilly · 2017-11-15T02:28:25Z

Facepalm moment: we don't even need inventories if we have cache$exists()! I can't believe I missed that! Anyway, drake should be a LOT faster in the next round of commits. Stay tuned.

wlandau-lilly · 2017-11-15T02:55:14Z

@kendonB I am excited for you to try a28cfeb. I totally dispensed with quick_inventory() and other inventory functions, and I added a skip_safety_checks argument to make() so you can skip check_drake_config().

wlandau-lilly · 2017-11-15T03:12:27Z

You may also be interested in 46e8b73. There, I added a new user-side ~~make_from_config()~~ make_with_config() function to do the work of make() on an existing configuration list from drake_config().

load_basic_example()
# Same as make(my_plan, ...):
config <- drake_config(my_plan)
make_with_config(config)

kendonB · 2017-11-15T07:00:05Z

make runs much faster now!

I'm not sure I can think of any use case for make_with_config though? Recall that I only need many different calls to make because I need to switch future::plan configurations for different sets of targets, thus requiring a different config object for each set.

wlandau-lilly · 2017-11-15T12:05:03Z

Glad to hear it!

With the latest bottlenecks identified, I am even more convinced that a version of make() with unconditional builds is the same as the "missing" trigger: both assume meta_list() is the bottleneck. So until meta_list() is rate-limiting, I am closing this issue. I think we can continue to find and eliminate bottlenecks as they arise, wherever they happen to be, without duplicating sensitive parts of the code. Drake's code base is getting too large as it is, I counted 9107 lines of code in R/.

By the way: build() is already exported, but it only builds a single target.

Given make()'s massive argument list, make_with_config() may be convenient for some folks, so I am keeping it.

Over the last month or so, you have raised 15 issues and submitted one pull request, and drake is a much better package because of it. There is still a contributor spot waiting for you whenever you want it. On a related note, I think you are getting much more familiar with drake's internals, so if you're up for it, I encourage you to use pull requests more often. Very soon, I will have trouble keeping up with incoming issues: I will be on vacation all of next week, and drake is about to get a lot more scrutiny via #147 and ropensci/software-review#156. I expect even more people to chip in if the reviews are successful.

wlandau-lilly added the type: new feature label Nov 14, 2017

wlandau-lilly added the topic: performance label Nov 14, 2017

wlandau-lilly assigned kendonB and wlandau-lilly Nov 14, 2017

wlandau-lilly closed this as completed Nov 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider a user-visible build command #157

Consider a user-visible build command #157

kendonB commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017 •

edited

Loading

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

kendonB commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017 •

edited

Loading

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017 •

edited

Loading

kendonB commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

Consider a user-visible build command #157

Consider a user-visible build command #157

Comments

kendonB commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017 • edited Loading

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017

kendonB commented Nov 14, 2017

wlandau-lilly commented Nov 14, 2017 • edited Loading

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017 • edited Loading

kendonB commented Nov 15, 2017

wlandau-lilly commented Nov 15, 2017

wlandau-lilly commented Nov 14, 2017 •

edited

Loading

wlandau-lilly commented Nov 14, 2017 •

edited

Loading

wlandau-lilly commented Nov 15, 2017 •

edited

Loading