-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new docs for on_schema_change #747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @matt-winkler!! I always find it so thrilling to update a "this isn't yet possible" section in the docs site with "heck yeah, it is now!"
I left a handful of comments, mostly around refining the language we want to use to talk about this feature. On substance, what you've got is pretty much good to go, so I'm happy to give this a thumbs up when you are.
Thanks for creating the v0.21 migration guide stub as well. I'll fill that out as I add docs for more new-in-21 features.
```sql | ||
{{ | ||
config( | ||
materialized='incremental', | ||
unique_key='date_day', | ||
on_schema_change=['ignore', 'fail', 'append_new_columns', 'sync_all_columns'] --choose one | ||
) | ||
}} | ||
``` | ||
|
||
</File> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also include an example of setting this from the project file? Similar to how we show both for incremental_strategy
below:
models:
+on_schema_change: sync_all_columns
|
||
**Note**: The `on_schema_change` behaviors do not currently include backfill functionality on the target table. | ||
|
||
### For dbt versions <= v0.20.0, refer to the logic below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### For dbt versions <= v0.20.0, refer to the logic below | |
### Default behavior | |
This is the behavior if `on_schema_change: ignore`, and on older versions of dbt. |
The behaviors for `on_schema_change` are: | ||
|
||
* `ignore`: this is the default, and preserves the behavior of dbt versions <= v.0.20.0 | ||
* `fail`: triggers an error message when the source and target schemas diverge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using the words "old" and "new" here instead of "target" and "source"? I'm not convinced that's better, just thinking about what would be most intuitive for a person newly grokking incremental models
* `ignore`: this is the default, and preserves the behavior of dbt versions <= v.0.20.0 | ||
* `fail`: triggers an error message when the source and target schemas diverge | ||
* `append_new_columns`: Append new columns identified in the temporary source schema to the target schema. Note that this setting does *not* remove columns from the target that are not present in the source. | ||
* `sync_all_columns`: Adds any new columns to the target table and removes them from the temporary source schema. Note that this is *inclusive* of data type changes. On Bigquery, data type changes currently cause a full table scan, so we advise Bigquery users to be mindful of the trade-offs when implementing this setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bit of wordsmithing:
* `sync_all_columns`: Adds any new columns to the target table and removes them from the temporary source schema. Note that this is *inclusive* of data type changes. On Bigquery, data type changes currently cause a full table scan, so we advise Bigquery users to be mindful of the trade-offs when implementing this setting. | |
* `sync_all_columns`: Adds all new columns, and removes all columns that have been removed. Note that this is *inclusive* of data type changes. On BigQuery, data type changes currently require a full scan of the existing table, so we advise BigQuery users to be mindful of the trade-offs when implementing this setting. |
* `append_new_columns`: Append new columns identified in the temporary source schema to the target schema. Note that this setting does *not* remove columns from the target that are not present in the source. | ||
* `sync_all_columns`: Adds any new columns to the target table and removes them from the temporary source schema. Note that this is *inclusive* of data type changes. On Bigquery, data type changes currently cause a full table scan, so we advise Bigquery users to be mindful of the trade-offs when implementing this setting. | ||
|
||
**Note**: The `on_schema_change` behaviors do not currently include backfill functionality on the target table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note**: The `on_schema_change` behaviors do not currently include backfill functionality on the target table. | |
**Note**: None of the `on_schema_change` behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a `--full-refresh`. |
@@ -196,6 +230,7 @@ select ... | |||
<Changelog> | |||
|
|||
- **v0.20.0:** Introduced `merge_update_columns` | |||
- **v0.21.0:** Introduced `on_schema_change` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to a new Changelog entry up under ## What if the columns of my incremental model change?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge this so we can have it live in next
in time for releasing v0.21.0-b1. I'll follow up on some of the TODOs in updates to come.
* new docs for on_schema_change (#747) * Prerelease: v0.21.0-b1 (#756) * Edits for on_schema_change * dbt source freshness * DBT_ENV_SECRET_ env var * dbt build first cut * Redshift profile ra3 property * Beta callout in migration guide * Self-review build docs * Prerelease: v0.21.0-b2 (#767) * state:modified subselectors, modified.macros * Add build RPC method * PR feedback * add dbt deps logging example (#798) * [Prerelease] Prep for 0.21.0-rc1 (#802) * Switch --models to --select * BQ snapshot config aliases * Configurable postgres connect timeout * Add list --output-keys. Add list RPC method * Adapter unique_field dbt-labs/dbt-core#3796 * PR feedback: -s replaces -m * Add BQ execution_project * Add default property for yaml selectors * Update migration guide. New fields in sources.json * Test where config macro * Dispatch for global macros * Update build details * Some self review * Greedy flag/property for test selection * Resolve #803 while we're here * Fix broken link typo * Refactor: configs + properties (#766) * Very incomplete start of first draft * Second big pass * Initial self review. Sidebar reorg * Continue self review. Address #616 * Add note to migration guide * [Prerelease] v0.21 post-RC updates (#831) * Artifact version bumps * Add v0.20 -> v0.21 to Cloud upgrade FAQ * PR feedback from jasnonaz <3 * [Release] v0.21.0 (#839) * Update links, info in migration guides * Fix v0.21 discourse link Co-authored-by: matt-winkler <75497565+matt-winkler@users.noreply.github.com> Co-authored-by: Sung Won Chung <sungwonchung3@gmail.com>
Description & motivation
Added documentation for new
on_schema_change
configuration options on incremental models per 3387Pre-release docs
Is this change related to an unreleased version of dbt?
next
<Changelog>[New/Changed] in v0.x.0</Changelog>