Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition macro #342

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

partition macro #342

wants to merge 1 commit into from

Conversation

dgitis
Copy link
Collaborator

@dgitis dgitis commented Aug 29, 2024

Description & motivation

This is a work-in-progress POC of using the INFORMATION_SCHEMA schema tables to find what was the last partition updated, checking source tables to see if they have been updated since the last update ran, and then only update partitions that have modifications.

It is intended to replace static_incremental_days but is currently only shared for the purpose of discussion.

The initial version should work with sharded and partitioned tables.

The macros is meant to be called at the top of partitioned models:

{% if is_incremental() %}
    {% set max_partition = get_max_date_partition( 'fct_ga4__event_page_view' ) %}
    {% set min_modified_partition = get_updated_since_last_modified( {{var('source_project')}} , 'source_dataset' , 'base_ga4__events', max_partition  ) %}
{% endif %}

And then when partition-pruning:

{% if is_incremental() %}
    where event_date_dt >= parse_date( '%Y%m%d' ,cast({{min_modified_partition }} as string))
{% endif %}

To-Do

  • multi-site
  • stop models from running when there are no matching partitions (haven't tested how this works as it is only applicable to daily sync with no streaming and my test environment has streaming enabled so there are always matching updates)
  • maybe we want to consolidate the query somehow so that we only check if the base_ga4__events has been modified once and then process all downstream models equally

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have run dbt test and python -m pytest . to validate existing tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant