Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/union source tables #59

Merged
merged 21 commits into from
Mar 20, 2023
Merged

Bug/union source tables #59

merged 21 commits into from
Mar 20, 2023

Conversation

fivetran-jamie
Copy link
Contributor

Are you a current Fivetran customer?

internal

What change(s) does this PR introduce?

  • removes does_table_exist logic from _tmp models, and does all table-existence checks in the fivetran_utils.union_data macro. these checks do not require users to define sources for their union_schemas or union_databases
  • adjusts _tmp models to automatically be created as NULL empty tables if the source table is not found in an provided schema or database

Did you update the CHANGELOG?

  • Yes

Does this PR introduce a breaking change?

  • Yes (please provide breaking change details below.)
  • No (please provide an explanation as to how the change is non-breaking below.)

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

  • Yes

Is this PR in response to a previously created Bug or Feature Request

How did you test the PR changes?

  • Buildkite
  • Local (please provide additional testing details below)

Select which warehouse(s) were used to test the PR

  • BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Provide an emoji that best describes your current mood

💒 (union joke lol)

Feedback

We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie thanks for working through this! Overall this is looking great and I added a few comments and suggestions to update within the package.

There is one issue that seems to occur that I am unsure how to solve. Your solution works great and operates as intended when the table does not exist in all sources. However, I found that the staging model not_null tests fail when this happens 🤔. I am not sure if this is something we encountered before, but wondering what your thoughts are on how to address this?
image

CHANGELOG.md Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
models/src_shopify.yml Outdated Show resolved Hide resolved
models/tmp/stg_shopify__discount_code_tmp.sql Outdated Show resolved Hide resolved
models/tmp/stg_shopify__discount_code_tmp.sql Outdated Show resolved Hide resolved
packages.yml Outdated
Comment on lines 5 to 6
- git: https://github.com/fivetran/dbt_fivetran_utils.git
revision: feature/expand-union-data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update

DECISIONLOG.md Outdated Show resolved Hide resolved
@fivetran-jamie
Copy link
Contributor Author

fivetran-jamie commented Mar 8, 2023

@fivetran-jamie thanks for working through this! Overall this is looking great and I added a few comments and suggestions to update within the package.

There is one issue that seems to occur that I am unsure how to solve. Your solution works great and operates as intended when the table does not exist in all sources. However, I found that the staging model not_null tests fail when this happens 🤔. I am not sure if this is something we encountered before, but wondering what your thoughts are on how to address this? image

great catch -- i switched the not_null test to https://github.com/calogica/dbt-expectations#expect_column_values_to_not_be_null (so when source_relation is not null, we run the test. this field should only be null if the table isn't found -- otherwise it is the schema/db or an empty string)

however, if someone doesn't have the discount source table and runs the transform package, they would get a test failure on the shopify__discounts end model, as this also has a not_null test. this gives us two options:

  1. let the customer get the error. if so, they'll need to manually disable the discount end model. pro: if they don't have discount data, they shouldn't have discount end models. con: if they suddenly do start using discounts, they'll have to manually enable this again
  2. apply the same conditional not_null tests in the transform package. pro: consistent, con: having to release new version of the package for something that can be done on customer's end

what do you think @fivetran-joemarkiewicz ?

a more paradigm-shifting alternative.... add a limit 0 to the all-null staging models we're creating.

@fivetran-joemarkiewicz
Copy link
Contributor

@fivetran-jamie thanks for the detailed response and thorough investigation of alternatives!

My only concern with the current implementation (adding the additional tests to the staging models) is if this will be needed to be applied to all other packages using the union feature? Or is this isolated only to these types of models that do not leverage variables and instead rely on the null filling if they do not exist? I wonder if we have other packages that leverage this combo (maybe stripe?).

Back to your original question - Between your two suggestions I am actually leaning more towards option 2 as this will ensure we are not having the customer experience an error and then required them to troubleshoot. We should try our best to make sure the packages work without error initially. Especially in the future state of quickstart, we may not have the luxury of having the customers define variables manually.

However, I kind of like your paradigm shifting thought of doing a limit 0 (seems similar to how our internal dry run works). I think this would mean we would not need to adjust downstream tests, but I worry this may cause issues with joins 🤔. In a perfect world we could get this to work by just turning off the tests based on the result of the tmp model. I don't think this is possible, but if you have an idea that could possibly get this to work then I am game to move forward with that! Regardless, I am open to thinking outside the box for this solution as it is multi-faceted.

In the end, I really agree with @fivetran-sheringuyen observation from her fivetran_utils review that we should raise a compiler warning to let customers know that we are doing something a bit different (creating empty tables, limit 0'ing, or anything else out of the norm) as I do believe it should be our responsibility to fill customers in on if we are doing something out of the norm, rather than them finding it down the line and being puzzled at the behavior.

Happy to chat more about this in person if you would like!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie this is looking great! I am almost ready to approve this PR, I just have some final comments within the fivetran_utils PR before giving this the green light.

This is awesome by the way and love the approach you have taken!

models/stg_shopify.yml Show resolved Hide resolved
models/stg_shopify.yml Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@fivetran-joemarkiewicz fivetran-joemarkiewicz mentioned this pull request Mar 14, 2023
14 tasks
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this @fivetran-jamie!! This looks good to go once the fivetran_utils update is rolled out 🙌

@fivetran-jamie fivetran-jamie merged commit 6bf9c4f into main Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants