-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer iff unselected reference does not exist in current env #2946
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty straightforward!
e2c7465
to
80eedb5
Compare
80eedb5
to
b025f20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy it, lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about a potential KeyError
, but beyond that, this LGTM!
I do wonder if we should rethink the flag/feature name --defer
as we make this subtle but significant change in functionality. Happy to adjudicate that one outside the scope of this particular PR, but think that as we see this functionality rolling out of "beta" we should be really cognizant of the words we use to describe it! It's very cool and very powerful, and i feel like the biggest blocker to its utility could actually be people not understanding how to use it / what it is for :D
if ( | ||
node.resource_type in refables and | ||
not node.is_ephemeral and | ||
unique_id not in selected | ||
unique_id not in selected and | ||
not adapter.get_relation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if this happens before or after a relation cache has been built? Curious if this implementation is going to run one introspective query per node in the deferral manifest, or if they'll all be cache lookups?
core/dbt/contracts/graph/manifest.py
Outdated
@@ -898,10 +899,14 @@ def merge_from_artifact( | |||
refables = set(NodeType.refable()) | |||
merged = set() | |||
for unique_id, node in other.nodes.items(): | |||
current = self.nodes[unique_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a scenario where unique_id
is present in the other manifest, but not in this manifest? I'm thinking that could happen if a model was deleted, for instance. If I'm thinking about that right, then I think this could raise a KeyError, and we'd probably want to use self.nodes.get(unique_id)
and do something smart if the result is None
.
I might be way off-base here, it just jumped out at me though and wanted to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good point, I was able to produce a KeyError
here and changed the line to handle. this also gave me a clue to #2875 :)
with adapter.connection_named('master'): | ||
self.create_schemas(adapter, selected_uids) | ||
self.populate_adapter_cache(adapter) | ||
self.defer_to_manifest(adapter, selected_uids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - i am seeing this now, and you can ignore my comment about caching above. Very clever, very cool. Leaving that comment above parce que je suis impressionné
Thank you all for taking a look! I'm excited that we'll manage to sneak this into v0.19.0.
I'm going through the exact same thought process, and definitely happy to continue it offline. I can't quite decide if "defer" better encapsulates the old behavior, the new behavior, or if it doesn't describe either very well at all. (I'm between #2 and #3.) The features here feel tricky, as do the words—"state comparison," "job deferral"—and I can't decide if we'd be better served by something a bit more... colorful? I mean, |
I'm going to merge this. Let's continue the conversation elsewhere about potentially renaming |
resolves #2909
Description
Today,
--defer
is a blanket override: build the resources included by my selection criteria, and use the--state
manifest's namespaces for any references to resources not included by my selection criteria.Instead, this PR would make it a more flexible fallback: If there's a resource I want to build, and it references a resource I haven't selected, use the other manifest's namespace if and only if the upstream resource doesn't exist in my current environment namespace. In this way,
--defer
is more like a "fail-over" mechanism, using the deferred environment to fill in the gaps of whatever's missing in the current environment.Pros:
dbt run --defer
would always defer non-model resources (seeds and snapshots). If you've just seeded or snapshotted into your scratch/CI schema, you probably want to use those objects to build your models in the following step.dbt run -m state:modified ...
followed bydbt run -m state:modified,config.materialized:incremental ...
).relationships
tests don't make a lot of sense in CI environments.Cons:
--defer
is dev, CI, and other scratch schemas that can be created, dropped, and recreated with aplomb.If you buy the premise, the solution is very straightforward—a pleasant surprise. We made deferral a beta feature in v0.18 for exactly this reason: we knew there would be wrinkles to iron out. All the same, it's better to make this change sooner rather than later.
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.