Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating branch with the latest changes in main #59

Merged
merged 12 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 49 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# dbt Constraints Package

This package generates database constraints based on the tests in a dbt project. It is currently compatible with Snowflake, PostgreSQL, and Oracle only.
This package generates database constraints based on the tests in a dbt project. It is currently compatible with Snowflake, PostgreSQL, Oracle, Redshift, and Vertica only.

## How the dbt Constraints Package differs from dbt's Model Contracts feature

This package focuses on automatically generating constraints based on the tests already in a user's dbt project. In most cases, merely adding the dbt Constraints package is all that is needed to generate constraints. dbt's recent [model contracts feature](https://docs.getdbt.com/docs/collaborate/govern/model-contracts) allows users to explicitly document constraints for models in yml. This package and the core feature are 100% compatible with one another and the dbt Constraints package will skip generating constraints already created by a model contract. However, the dbt Constraints package will also generate constraints for any tests that are not documented as model contracts. As described in the next section, dbt Constraints is also designed to provide join elimination on Snowflake.

## Why data engineers should add referential integrity constraints

Expand All @@ -10,7 +14,7 @@ In addition, although Snowflake doesn't enforce most constraints, the [query opt

Many databases including [Snowflake](https://docs.snowflake.com/en/user-guide/join-elimination.html), PostgreSQL, Oracle, SQL Server, MySQL, and DB2 can use referential integrity constraints to perform "[Join Elimination](https://blog.jooq.org/join-elimination-an-essential-optimiser-feature-for-advanced-sql-usage/)" to remove tables from an execution plan. This commonly occurs when you query a subset of columns from a view and some of the tables in the view are unnecessary. In addition, on databases that do not support join elimination, some [BI and visualization tools will also rewrite their queries](https://docs.snowflake.com/en/user-guide/table-considerations.html#referential-integrity-constraints) based on constraint information, producing the same effect.

Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL and Oracle require indexes on their primary key columns in order to perform efficient joins between tables. Typically a primary key or unique key constraint is enforced on such databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but [that is something best added manually](https://docs.getdbt.com/reference/resource-configs/postgres-configs#indexes).
Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL and Oracle require indexes on their primary key columns in order to perform efficient joins between tables. A primary key or unique key constraint is typically enforced on databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but [that is something best added manually](https://docs.getdbt.com/reference/resource-configs/postgres-configs#indexes).

## Please note

Expand Down Expand Up @@ -117,13 +121,13 @@ packages:

Generally, if you don't meet a requirement, tests are still executed but the constraint is skipped rather than producing an error.

- All models involved in a constraint must be materialized as table, incremental, or snapshot.
- All models involved in a constraint must be materialized as table, incremental, snapshot, or seed.

- If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.
- If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.

- All columns on constraints must be individual column names, not expressions. You can reference columns on a model that come from an expression.

- Constraints are not created for failed tests
- Constraints are not created for failed tests. See how to get around this using severity and `config: always_create_constraint: true` in the next section.

- `primary_key`, `unique_key`, and `foreign_key` tests are considered first and duplicate constraints are skipped. One exception is that you will get an error if you add two different `primary_key` tests to the same model.

Expand All @@ -133,7 +137,46 @@ Generally, if you don't meet a requirement, tests are still executed but the con

- The `foreign_key` test will ignore any rows with a null column, even if only one of two columns in a compound key is null. If you also want to ensure FK columns are not null, you should add standard `not_null` tests to your model which will add not null constraints to the table.

- Referential constraints must apply to all the rows in a table so any tests with a `config: where:` property will be skipped when creating constraints.
- Referential constraints must apply to all the rows in a table so any tests with a `config: where:` property will be skipped when creating constraints. See how to disable this rule using `config: always_create_constraint: true` in the next section.


## Advanced: `config: always_create_constraint: true` property
There is an advanced option to force a constraint to be generated when there is a `config: where:` property or if the constraint has a threshold. The `config: always_create_constraint: true` property will override those exclusions. When this setting is in effect, you can create constraints even when you have excluded some records or have a number of failures below a threshold. If your test has a status of 'failed', it will still be skipped. Please see [dbt's documentation on how to set a threshold for failures](https://docs.getdbt.com/reference/resource-configs/severity).

__Caveat Emptor:__
* You will get an error if you try to force constraints to be generated that are enforced by your database. On Snowflake that is only a not_null constraint but on databases like Oracle, all the generated constraints are enforced.
* This feature could cause unexpected query results on Snowflake due to [join elimination](https://docs.snowflake.com/en/user-guide/join-elimination).

This is an example using the feature:
```yml
- name: dim_duplicate_orders
description: "Test that we do not try to create PK/UK on failed tests"
columns:
- name: o_orderkey
description: "The primary key for this table"
- name: o_orderkey_seq
description: "duplicate seq column to test UK"
tests:
# This constraint should be skipped because it has failures
- dbt_constraints.primary_key:
column_name: o_orderkey
config:
severity: warn
# This constraint should be still generated because always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey
config:
warn_if: ">= 5000"
error_if: ">= 10000"
always_create_constraint: true
# This constraint should be still generated because always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey_seq
config:
severity: warn
always_create_constraint: true
```


## Primary Maintainers

Expand Down
12 changes: 11 additions & 1 deletion integration_tests/models/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,14 +147,24 @@ models:
- name: o_orderkey_seq
description: "duplicate seq column to test UK"
tests:
# This constraint should be skipped because it has failures
- dbt_constraints.primary_key:
column_name: o_orderkey
config:
severity: warn
# This constraint can be generated if you uncomment always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey
config:
warn_if: ">= 5000"
error_if: ">= 10000"
# always_create_constraint: true
# This constraint can be generated if you uncomment always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey_seq
config:
severity: warn
# always_create_constraint: true

- name: fact_order_line_missing_orders
description: "Test that we do not create FK on failed tests"
Expand Down Expand Up @@ -225,7 +235,7 @@ models:
to: ref('supplier')
field: s_suppkey
tests:
- dbt_constraints.unique_key:
- dbt_constraints.primary_key:
column_names:
- ps_partkey
- ps_suppkey
Expand Down
9 changes: 6 additions & 3 deletions macros/create_constraints.sql
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,14 @@
{#- Loop through the results and find all tests that passed and match the constraint_types -#}
{#- Issue #2: added condition that the where config must be empty -#}
{%- for res in results
if res.status == "pass"
and res.node.config.materialized == "test"
if res.node.config.materialized == "test"
and res.status in ("pass", "warn")
and res.node.test_metadata
and res.node.test_metadata.name is in( constraint_types )
and res.node.config.where is none -%}
and ( res.failures == 0 or
res.node.config.get("always_create_constraint", false) )
and ( res.node.config.where is none or
res.node.config.get("always_create_constraint", false) ) -%}

{%- set test_model = res.node -%}
{%- set test_parameters = test_model.test_metadata.kwargs -%}
Expand Down
Loading