-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new internal fast-lookup data structure for target information #440
Comments
Additional namespaces for data objects that are also import/target build steps:
All the namespaces that do not exist (yet) shall explicitly be set to |
We should also maintain overall |
Forgive me if this is a dumb question, but what's the barrier to just using the plan ref for benchmarking |
There is no barrier, technically speaking. I am glad you checked with richfitz/storr#81. From these results, plain environments seem to win out in terms of lookup speed, and I just assumed From richfitz/storr#81 (comment), I now think the right approach is to have |
Edit: existing |
This data structure is really a protocol for building the targets. Whereas the |
To do: put an |
Is it worth benchmarking |
Should be easy enough to check. |
Also, we really do need to think about situations where a graph is really handy: for example, |
From #483, the complete dependency information of targets is neatly arranged in
|
I do not intend this new lookup data structure to replace the graph, at least at first, but it will be a useful supplement for speeding things up. |
I plan to encapsulate the lookup operation internally. get_lookup <- function (target, field, config){
info <- get(
x = target,
envir = config$lookup,
inherit = FALSE
)
get(
x = field,
envir = info,
inherit = FALSE
)
} Here, Also, the lookup should be a read-only data structure. possible names:
Leaning towards |
Before diving headlong into this refactoring, I want to do more in-depth profiling. The |
See #575. |
This feature may not reduce overhead, but it should simplify the architecture. |
Just started the work here. |
New favorite term for this structure: "layout". I also thought about "topology", but it seems more redundant with "graph" and implies stronger-than-necessary mathematical claims. |
Fixed via #576. |
On reflection, the term "layout" is not quite right. Let's go with "workflow specification", or "spec" for short. The spec is The name change is implemented in 322785e. |
Lessons from #435 and #283:
drake
spends a lot of time repeatedly looking up target-level information it should already know.Explanation
I think these and similar problems come from the limitations of
drake
's data structures. Theplan
is great for users, but lookup is slow, and it contains only indirect information about each target's dependencies. Thegraph
has complete dependency information, but it does not have other metadata (igraph
attributes are a pain), and traversing the whole graph for dependencies is slow.Solution
I want to put all target-level metadata in a unified fast-lookup data structure. This new
config$meta
will replace individual outputs ofdrake_meta()
flying around, and it will decrease reliance on theplan
andgraph
. For fast lookup, I want to base this data structure on an environment. For convenience, I want it to have astorr
-like API. Best of both worlds: astorr_environment()
cache. (Thank you @richfitz for making it available!)There is a lot of refactoring to do, but I think this will make
drake
a whole lot faster and give it a cleaner code base.The text was updated successfully, but these errors were encountered: