Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new docs for on_schema_change #747

Merged
merged 1 commit into from
Aug 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,41 @@ On warehouses that do not support `merge` statements, a merge is implemented by
Transaction management is used to ensure this is executed as a single unit of work.

## What if the columns of my incremental model change?
If you add a column from your incremental model, and execute a `dbt run`, this column will _not_ appear in your target table.

:::tip New `on_schema_change` config in dbt version `v0.21.0`

Incremental models can now be configured to include an optional `on_schema_change` parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer `--full-refresh` scenarios and saving query costs.

:::

You can configure the `on_schema_change` setting as follows.

<File name='models/staging/fct_daily_active_users.sql'>

```sql
{{
config(
materialized='incremental',
unique_key='date_day',
on_schema_change=['ignore', 'fail', 'append_new_columns', 'sync_all_columns'] --choose one
)
}}
```

</File>
Comment on lines +156 to +166
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also include an example of setting this from the project file? Similar to how we show both for incremental_strategy below:

models:
  +on_schema_change: sync_all_columns


The behaviors for `on_schema_change` are:

* `ignore`: this is the default, and preserves the behavior of dbt versions <= v.0.20.0
* `fail`: triggers an error message when the source and target schemas diverge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the words "old" and "new" here instead of "target" and "source"? I'm not convinced that's better, just thinking about what would be most intuitive for a person newly grokking incremental models

* `append_new_columns`: Append new columns identified in the temporary source schema to the target schema. Note that this setting does *not* remove columns from the target that are not present in the source.
* `sync_all_columns`: Adds any new columns to the target table and removes them from the temporary source schema. Note that this is *inclusive* of data type changes. On Bigquery, data type changes currently cause a full table scan, so we advise Bigquery users to be mindful of the trade-offs when implementing this setting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of wordsmithing:

Suggested change
* `sync_all_columns`: Adds any new columns to the target table and removes them from the temporary source schema. Note that this is *inclusive* of data type changes. On Bigquery, data type changes currently cause a full table scan, so we advise Bigquery users to be mindful of the trade-offs when implementing this setting.
* `sync_all_columns`: Adds all new columns, and removes all columns that have been removed. Note that this is *inclusive* of data type changes. On BigQuery, data type changes currently require a full scan of the existing table, so we advise BigQuery users to be mindful of the trade-offs when implementing this setting.


**Note**: The `on_schema_change` behaviors do not currently include backfill functionality on the target table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note**: The `on_schema_change` behaviors do not currently include backfill functionality on the target table.
**Note**: None of the `on_schema_change` behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a `--full-refresh`.


### For dbt versions <= v0.20.0, refer to the logic below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### For dbt versions <= v0.20.0, refer to the logic below
### Default behavior
This is the behavior if `on_schema_change: ignore`, and on older versions of dbt.


If you add a column to your incremental model, and execute a `dbt run`, this column will _not_ appear in your target table.

Similarly, if you remove a column from your incremental model, and execute a `dbt run`, this column will _not_ be removed from your target table.

Expand Down Expand Up @@ -196,6 +230,7 @@ select ...
<Changelog>

- **v0.20.0:** Introduced `merge_update_columns`
- **v0.21.0:** Introduced `on_schema_change`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to a new Changelog entry up under ## What if the columns of my incremental model change??


</Changelog>

Expand Down
21 changes: 21 additions & 0 deletions website/docs/docs/guides/migration-guide/upgrading-to-0-21-0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Upgrading to 0.21.0"

---

### Resources

- [Discourse](https://discourse.getdbt.com/t/2621)
- [Changelog](https://github.com/fishtown-analytics/dbt/blob/develop/CHANGELOG.md)

## Breaking changes

## New and changed documentation

### Tests

### Elsewhere in Core
- [Configuring Incremental Models](configuring-incremental-models): Notes on updated configurations to incrementals, including the `on_schema_change` config.

### Plugins