-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer refers to outdated seed #2909
Comments
Thanks for the writeup @dmateusp. The issue here is that I think this fix is going to be tricky: we'd need dbt to defer on the basis of an intermediate set of selected nodes, rather than the final set. This is part of what made test deferral so cumbersome (#2701), and ultimately not worthwhile. In this case, though, I think we need to figure out an answer. A much longer-term resolution here would be a generalized command that can operate on both seeds and models (#2743). In the meantime, the only workarounds I can think of are quite hacky. E.g. if you have a staging model ( dbt seed -s state:modified --state .state
dbt run -m state:modified+1,staging.seeds --state .state # no deferral
dbt run -m state:modified+ --exclude state:modified+1,staging.seeds --defer --state .state |
Thank you for the details @jtcohen6, the staging model is an interesting solution but not something practical to enforce in our project right now unfortunately. For a short term solution would it be possible to extend dbt to allow deferring only models ? That way I could change my CI to always re-create all the seeds (which is not a super big deal since seeds should be small)
|
@dmateusp That's a good thought, and if it proves much more straightforward, that may be the move. In either case, it will require tweaking some of the logic of deferral—which isn't something we can squeeze in for the next minor version (v0.19), but possibly for the one after. In the meantime, I'll plan to document this as a known caveat to state comparison + deferral. |
One approach that's occurred to me, though I haven't thought through all the implications: Today, we defer all models that are not included in the node selection criteria. That means we defer:
Perhaps we shouldn't use selection criteria as the basis for deferral. Instead, during compilation, we could check to see if a referent's "new" representation ( If we took this more-naive approach to deferral:
|
I like that a lot! |
Describe the bug
I'm running the following on CI:
Say we have a seed
countries.csv
:And a model
dim_countries.sql
:Now in one PR we change the seed to
And we change the model to:
dbt correctly identifies that
countries.csv
changed, and thatdim_countries.sql
changed. However when the model runs, it fails with "column id does not exist" because the model tries to read the seed from the "main run" (it defers the seed) instead of identifying that it re-ran.Note that I tried copying the manifest produced by
dbt seed
intotarget/prod/
but what happens then is that dbt does not identify the model as "modified"dbt version
0.18.1
The text was updated successfully, but these errors were encountered: