Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2782] [Bug] Error - cannot reference permanent table from temporary table constraint when running incremental model with a foreign key #8022

Closed
2 tasks done
Tracked by #7979
amardatar opened this issue Jul 3, 2023 · 7 comments · Fixed by #8768
Assignees
Labels
backport 1.5.latest backport 1.6.latest bug Something isn't working Impact: Adapters Medium Severity bug with minor impact that does not have resolution timeframe requirement model_contracts multi_project

Comments

@amardatar
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running an incremental model with contract enforced and a foreign key, the model will fail on subsequent updates.

Expected Behavior

The update should succeed without an error.

Steps To Reproduce

First run a statement like:

create table if not exists numbers (n int not null primary key)

Then, given a schema.yml file:

version: 2

models:
  - name: test
    config:
      contract:
        enforced: true
      materialized: incremental
      on_schema_change: append_new_columns
      unique_key: n
    columns:
      - name: n
        data_type: integer
        constraints:
          - type: foreign_key
            expression: "numbers (n)"

And a SQL file:

select 1 as n

dbt will succeed on the first run, and subsequently fail with the error:

cannot reference permanent table from temporary table constraint

As per the error returned, the issue appears to be that while rendering a temporary table as part of the update, dbt will also render the foreign key constraint (which is not valid in the context of a temporary table). This leads to the above failure.

Relevant log output

No response

Environment

- OS: macOS Ventura 13.4.1
- Python: 3.11.4
- dbt: 1.5.2

Which database adapter are you using with dbt?

redshift

Additional Context

While I've run into this issue using Redshift (and using the dbt-redshift adapter) I've put this bug in dbt-core as I believe from a bit of research that the same issue would occur if this were run in PostgreSQL (and possibly in other data warehouses, although it was harder to research the issue broadly).

@amardatar amardatar added bug Something isn't working triage labels Jul 3, 2023
@github-actions github-actions bot changed the title [Bug] Error - cannot reference permanent table from temporary table constraint when running incremental model with a foreign key [CT-2782] [Bug] Error - cannot reference permanent table from temporary table constraint when running incremental model with a foreign key Jul 3, 2023
@jtcohen6 jtcohen6 self-assigned this Jul 4, 2023
@LeviAndrewDixon
Copy link

I have the same issue on Postgres 15.x, dbt: 1.52, dbt-postgres:1.5.2 with an incremental model: i.e. materialized="incremental". It works on full-refresh and fails on subsequent builds when is_incremental() is true.

18:54:56  Database Error in model fact_session (models/public/fact_session.sql)
18:54:56    constraints on temporary tables may reference only temporary tables

dim_partner:

models:
  - name: dim_partner
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: varchar(32)
        constraints:
          - type: primary_key
          - type: not_null
...

fact_session:

  - name: fact_session
    config:
      contract:
        enforced: true
    columns:
      - name: partner_id
        data_type: varchar(32)
        constraints:
          - type: not_null
          - type: foreign_key
            expression: "dim_partner (id) ON DELETE CASCADE"
...

@jtcohen6
Copy link
Contributor

@amardatar @LeviAndrewDixon Thanks for opening the issue, and for the reproduction case!

I've confirmed that creating a temporary table with a FK constraint referencing a permanent table raises an error on Postgres & Redshift, but it does not raise an error on Snowflake.

(BigQuery is an odder case: We only use "real" temp tables on BigQuery, which need to be run inside of a multi-statement "script," for the insert_overwrite strategy — and even then, we create the non-temp table in advance because we need to check for schema changes, since we require that an incremental model with an enforced contract must use either the fail or append_new_columns mode of schema evolution.)

It's true that foreign key constraints are not very well supported in dbt right now, especially since we don't support ref in the expression field. It doesn't feel like a big priority for us, so long as FK constraints aren't actually enforced on the most commonly used analytical data platforms (#8062).

Still, it's undesirable behavior for FK constraints to work only sometimes for incremental models on Postgres/Redshift. Options:

  1. Avoid templating all constraints if we're creating a temp table. The constraints will still be enforced when the data is inserted into the preexisting table, which had its constraints applied upon creation. However, this won't handle cases where a totally new column is added with a totally new constraint — already something that's not really handled today. It will also mean a longer delay (more time/compute) to find out about violations of enforced constraints.
  2. As in option (1), but just for foreign key constraints. While this limits the impact of the change, it also requires thornier logic to plumb through the macros/methods for constraint rendering.

I'm supportive of pursuing one of those, and option (1) feels cleaner. The next steps would be:

  • Skip these lines in create_table_as, if we're creating a temp table. Needed in the default implementation (used by Postgres/Redshift), but Snowflake can keep doing as it's doing.
  • Open a separate issue to track the case of a new column / new constraint being added to an incremental model, via on_schema_change

@amardatar
Copy link
Author

Thanks for the update @jtcohen6!

It's true that foreign key constraints are not very well supported in dbt right now, especially since we don't support ref in the expression field. It doesn't feel like a big priority for us, so long as FK constraints aren't actually enforced on the most commonly used analytical data platforms (#8062).

I think it's worth mentioning just regarding this point - for Redshift specifically, foreign keys do carry special meaning (despite not being enforced) - as per point 2 in this docs page a foreign key will be used for collocation of data, which is relevant for performance. I know - this is niche, and Redshift itself doesn't have the popularity of other warehouses, so I don't expect this to be high priority - but also figured it was relevant knowledge to share since it is more than just purely informational for Redshift.

In terms of the options you mentioned - don't know if you're particularly looking for our input here or if that's just a decision that needs to be made internally; for my part I'd agree that (1) seems the better option (otherwise this potentially goes down the track of different adapters having different constraints rendered at different times, and confusion on what exactly to expect).

@jtcohen6
Copy link
Contributor

@amardatar Fair point! Snowflake supports something similar—using the constraint as an input to query rewrite/optimization—by specifying RELY for the constraint. (Though it's off by default.)

@graciegoheen
Copy link
Contributor

Note from refinement meeting:

  • If we do go for option 1, we should just make sure we're not causing any regressions or noticeable performance decreases

@jtcohen6 jtcohen6 assigned QMalcolm and unassigned QMalcolm Aug 2, 2023
@nathangriffiths-cdx
Copy link

It doesn't feel like a big priority for us, so long as FK constraints aren't actually enforced on the most commonly used analytical data platforms (#8062).

Not enforced but, as mentioned above, are used in various ways for optimization - note BigQuery now also supports the definition of unenforced primary and foreign keys for query optmization purposes: https://cloud.google.com/blog/products/data-analytics/join-optimizations-with-bigquery-primary-and-foreign-keys?hl=en

@aranke
Copy link
Member

aranke commented Oct 10, 2023

I verified a fix for Postgres in #8768, created a backlog ticket for Redshift here: dbt-labs/dbt-redshift#628.

github-actions bot pushed a commit that referenced this issue Oct 10, 2023
github-actions bot pushed a commit that referenced this issue Oct 10, 2023
aranke added a commit that referenced this issue Oct 11, 2023
…l model results in Database Error (#8807)

(cherry picked from commit 6461f5a)

Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
aranke added a commit that referenced this issue Oct 11, 2023
…l model results in Database Error (#8808)

(cherry picked from commit 6461f5a)

Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.5.latest backport 1.6.latest bug Something isn't working Impact: Adapters Medium Severity bug with minor impact that does not have resolution timeframe requirement model_contracts multi_project
Projects
None yet
7 participants