-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/union source tables #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-jamie thanks for working through this! Overall this is looking great and I added a few comments and suggestions to update within the package.
There is one issue that seems to occur that I am unsure how to solve. Your solution works great and operates as intended when the table does not exist in all sources. However, I found that the staging model not_null
tests fail when this happens 🤔. I am not sure if this is something we encountered before, but wondering what your thoughts are on how to address this?
packages.yml
Outdated
- git: https://github.com/fivetran/dbt_fivetran_utils.git | ||
revision: feature/expand-union-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to update
great catch -- i switched the not_null test to https://github.com/calogica/dbt-expectations#expect_column_values_to_not_be_null (so when however, if someone doesn't have the discount source table and runs the transform package, they would get a test failure on the
what do you think @fivetran-joemarkiewicz ? a more paradigm-shifting alternative.... add a |
@fivetran-jamie thanks for the detailed response and thorough investigation of alternatives! My only concern with the current implementation (adding the additional tests to the staging models) is if this will be needed to be applied to all other packages using the union feature? Or is this isolated only to these types of models that do not leverage variables and instead rely on the null filling if they do not exist? I wonder if we have other packages that leverage this combo (maybe stripe?). Back to your original question - Between your two suggestions I am actually leaning more towards option 2 as this will ensure we are not having the customer experience an error and then required them to troubleshoot. We should try our best to make sure the packages work without error initially. Especially in the future state of quickstart, we may not have the luxury of having the customers define variables manually. However, I kind of like your paradigm shifting thought of doing a limit 0 (seems similar to how our internal dry run works). I think this would mean we would not need to adjust downstream tests, but I worry this may cause issues with joins 🤔. In a perfect world we could get this to work by just turning off the tests based on the result of the tmp model. I don't think this is possible, but if you have an idea that could possibly get this to work then I am game to move forward with that! Regardless, I am open to thinking outside the box for this solution as it is multi-faceted. In the end, I really agree with @fivetran-sheringuyen observation from her fivetran_utils review that we should raise a compiler warning to let customers know that we are doing something a bit different (creating empty tables, limit 0'ing, or anything else out of the norm) as I do believe it should be our responsibility to fill customers in on if we are doing something out of the norm, rather than them finding it down the line and being puzzled at the behavior. Happy to chat more about this in person if you would like! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-jamie this is looking great! I am almost ready to approve this PR, I just have some final comments within the fivetran_utils PR before giving this the green light.
This is awesome by the way and love the approach you have taken!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your hard work on this @fivetran-jamie!! This looks good to go once the fivetran_utils update is rolled out 🙌
Are you a current Fivetran customer?
internal
What change(s) does this PR introduce?
does_table_exist
logic from _tmp models, and does all table-existence checks in thefivetran_utils.union_data
macro. these checks do not require users to define sources for their union_schemas or union_databasesDid you update the CHANGELOG?
Does this PR introduce a breaking change?
Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)
Is this PR in response to a previously created Bug or Feature Request
How did you test the PR changes?
Select which warehouse(s) were used to test the PR
Provide an emoji that best describes your current mood
💒 (union joke lol)
Feedback
We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.