Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle schema pattern on BQ #275

Merged
merged 3 commits into from
Sep 14, 2020
Merged

Handle schema pattern on BQ #275

merged 3 commits into from
Sep 14, 2020

Conversation

clrcrl
Copy link
Contributor

@clrcrl clrcrl commented Sep 4, 2020

This is a:

  • bug fix PR with no breaking changes (please change the base branch to master)
  • new functionality
  • a breaking change

Description & motivation

We weren't handling the schema_pattern arg correctly. And it turns out that on BQ it's kind of annoying to handle it the right way. I think I got it though.

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • n/a I have updated the README.md (if applicable)
  • n/a I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to the changelog

Local testing

I added this:
Raw: analysis/foo.sql

{% set tables = dbt_utils.get_relations_by_pattern(
    schema_pattern = '%',
    table_pattern = '%'
) %}

{% for table in tables %}
    select * from {{ table }}
{% endfor %}

Compiled:

    select * from `unique-serenity-254521`.`dbt_claire`.`data_star_expected`

    select * from `unique-serenity-254521`.`dbt_claire`.`data_test_relationships_where_table_1`

    select * from `unique-serenity-254521`.`dbt_claire`.`data_unpivot`
...

LGTM :)

To discuss:

  • wildcard logic — see comment below
  • whitespace cleanup — I didn't do any whitespace cleaning up. I think it's okay because these macros use return though

where table_schema = '{{schema_pattern}}'
and lower(table_name) like lower ('{{table_pattern}}')
and lower(table_name) not like lower ('{{exclude}}')
{% if '%' in schema_pattern %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, _ is a wildcard pattern (docs)
Screen Shot 2020-09-04 at 1 58 23 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we actually introduced a slight regression by switching to using like operators instead of = operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow how this is a regression?

Question is, should we amend that line to be

{% if '%' in schema_pattern or '_' in schema_pattern %}

knowing we're going to get a lot more false positives...

Copy link
Contributor Author

@clrcrl clrcrl Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow how this is a regression?

Let's say we have schemas named a_c and abc.

I think previously for get_relations_by_prefix we used a strict = for the schema name.

where schema_name = 'a_c'

This would return only a_c.

Now, we pass it to a like operator (i.e. here and here)

where schema_name like 'a_c'

This would return both a_c and abc.

It's a teeny tiny regression for anyone that uses get_relations_by_prefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. To match only a_c, the user should pass the pattern as a\\_c. I still think it's okay to ship this in a patch release, maybe link to the BQ doc in the readme.

Copy link
Contributor Author

@clrcrl clrcrl Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's just ship this as-is, and if someone notices the regression we can fix it heh

@clrcrl clrcrl requested a review from jtcohen6 September 4, 2020 18:47
@clrcrl clrcrl mentioned this pull request Sep 4, 2020
1 task
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! Updated the integration test to (a) include an input table from a different schema, and (b) actually check for equality with an expected result. Smooth sailing on all databases, including BQ.

Just left one comment/question on your comment/question

where table_schema = '{{schema_pattern}}'
and lower(table_name) like lower ('{{table_pattern}}')
and lower(table_name) not like lower ('{{exclude}}')
{% if '%' in schema_pattern %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow how this is a regression?

Question is, should we amend that line to be

{% if '%' in schema_pattern or '_' in schema_pattern %}

knowing we're going to get a lot more false positives...

@clausherther
Copy link
Contributor

clausherther commented Sep 9, 2020

Just ran into this :)

For context, in our Blue/Green deployment pipeline, the schema changes from Blue to Green, so we'd like to use % for the schema pattern.

@clausherther
Copy link
Contributor

FYI, we are currently using {% set relations = dbt_utils.get_relations_by_prefix('xa', '') %} to find all tables in the xa schema, but this returns 0 rows now if the model you're running this in ({{adapter.quote(database)}}.{{schema}}) is in a different schema than schema_pattern. I think this is a bug?

select distinct
            table_schema, table_name

        from {{adapter.quote(database)}}.{{schema}}.INFORMATION_SCHEMA.TABLES
        where table_schema = '{{schema_pattern}}'
            and lower(table_name) like lower ('{{table_pattern}}')
            and lower(table_name) not like lower ('{{exclude}}')

@clrcrl clrcrl merged commit c838953 into master Sep 14, 2020
@clrcrl clrcrl deleted the fix/bq-schema-pattern branch September 14, 2020 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants