Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Microbatch should allow copy_partitions #1414

Closed
3 tasks done
Tracked by #10624
jschintz-nytimes opened this issue Nov 26, 2024 · 5 comments
Closed
3 tasks done
Tracked by #10624

[Feature] Microbatch should allow copy_partitions #1414

jschintz-nytimes opened this issue Nov 26, 2024 · 5 comments
Assignees
Labels
feature:microbatch Issues related to the microbatch incremental strategy type:enhancement New feature or request

Comments

@jschintz-nytimes
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Merging in BQ is great - but not always the most performant

Allow for the existing feature "copy_partitions" to be available in the microbatch incremental strategy

Describe alternatives you've considered

Write a custom strategy

Who will this benefit?

Anyone using microbatch

Are you interested in contributing this feature?

Maybe!

Anything else?

No response

@amychen1776 amychen1776 changed the title [Feature] Mcirobatch should allow copy_partitions [Feature] Micro-batch should allow copy_partitions Nov 26, 2024
@amychen1776 amychen1776 added feature:microbatch Issues related to the microbatch incremental strategy and removed triage:product labels Nov 26, 2024
@borjavb
Copy link
Contributor

borjavb commented Nov 28, 2024

If I'm not mistaken I think copy_partitions is already working with microbatch.

microbatch under the hood relies on insert_overwrite and the copy_partitions config is being set through all the new microbatch logic.

{% macro bq_generate_microbatch_build_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}
{% set build_sql = bq_insert_overwrite_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}
{{ return(build_sql) }}
{% endmacro %}

@jschintz-nytimes
Copy link
Author

jschintz-nytimes commented Dec 2, 2024

If I'm not mistaken I think copy_partitions is already working with microbatch.

microbatch under the hood relies on insert_overwrite and the copy_partitions config is being set through all the new microbatch logic.

{% macro bq_generate_microbatch_build_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}
{% set build_sql = bq_insert_overwrite_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}
{{ return(build_sql) }}
{% endmacro %}

@borjavb I think that's just passing a required parameter with the default, false.
I'm seeing merge strategy being used in the microbatch run code.

    merge into `dev`.`dbt_cloud_pr_00000_000`.`microbatch` as DBT_INTERNAL_DEST
        using (
        select
        * from `dev`.`dbt_cloud_pr_00000_000`.`microbatch`
      ) as DBT_INTERNAL_SOURCE
        on FALSE

    when not matched by source
         and date(DBT_INTERNAL_DEST.partitiontime) in unnest(dbt_partitions_for_replacement) 
        then delete

    when not matched then insert
        (`fields`)
    values
        (`fields`)

;

@borjavb
Copy link
Contributor

borjavb commented Dec 2, 2024

You are right, is not working. Sorry!

I think I found the issue, basically there's a conditional check where copy_partitions can only be set on insert-overwrite, and it was not updated to accept microbatch. Let me open a PR.


How to reproduce:

{{ config(
    materialized='incremental',
    incremental_strategy='microbatch',
    event_time='session_start',
    begin='2024-12-02',
    batch_size='day',
    lookback = 1,
    partition_by={
      'field': 'session_start',
      'granularity': 'day',
      "copy_partitions": true
    }
) }}
select CURRENT_DATE() as session_start

That trigger this line

{% if partition_by.copy_partitions is true and strategy != 'insert_overwrite' %} {#-- We can't copy partitions with merge strategy --#}
{% set wrong_strategy_msg -%}
The 'copy_partitions' option requires the 'incremental_strategy' option to be set to 'insert_overwrite'.
{%- endset %}
{% do exceptions.raise_compiler_error(wrong_strategy_msg) %}

@borjavb
Copy link
Contributor

borjavb commented Dec 2, 2024

https://github.com/dbt-labs/dbt-bigquery/pull/1421/files This should do it!

But I wonder if we should default copy_partitions as true for microbatch... It might be the best option for this operation really...

@MichelleArk MichelleArk changed the title [Feature] Micro-batch should allow copy_partitions [Feature] Microbatch should allow copy_partitions Dec 5, 2024
@MichelleArk MichelleArk self-assigned this Dec 10, 2024
@MichelleArk
Copy link
Contributor

Closed by #1421, this will go out in the next 1.9 patch (1.9.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:microbatch Issues related to the microbatch incremental strategy type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants