Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate mocking recursing further than required #144

Merged
merged 1 commit into from Jun 17, 2023
Merged

Investigate mocking recursing further than required #144

merged 1 commit into from Jun 17, 2023

Conversation

daveworth
Copy link
Contributor

@daveworth daveworth commented Jun 12, 2023

Summary

The dbt-unit-testing SQL Builder macro build_model_complete_sql contains the build_model_dependencies sub-macro which recursively traverses the DAG in a depth-first fashion to gather the list of nodes of the DAG which must be synthesized. It takes a list of mocked nodes as a second parameter and short-circuits the recursion if the current node is mocked. This list of mocked nodes is passed to the first invocation of build_model_dependencies but is not passed in the subsequent calls meaning that in the case where the mocked model or source is not referenced directly from the model under test all dependencies are mocked.

Existing Behavior in sql_builders.sql

{% macro build_model_dependencies(node, models_to_exclude, build_full_lineage=True) %}
    {# snip ... #}
    {% set child_model_dependencies = dbt_unit_testing.build_model_dependencies(node) %}
    {# snip ... #}
{% endmacro %}

Desired Behavior in sql_builders.sql

{% macro build_model_dependencies(node, models_to_exclude, build_full_lineage=True) %}
    {# snip ... #}
    {% set child_model_dependencies = dbt_unit_testing.build_model_dependencies(node, models_to_exclude, build_full_lineage) %}
    {# snip ... #}
{% endmacro %}

Impact

Under ordinary circumstances this generates a larger test-query than is strictly needed but does not have other negative impact. In our dbt environment we use the dbt-fal adapter to bring our own runtime for dbt-python models. dbt-unit-testing reads the contents of models inject into the mocked CTEs (while resolving source/ref call to allow this recursively). When the recursive step reads a python model the resulting CTE is invalid SQL (it's Python code!) and the test fails.

Additional Details

It is unclear to me how to test this directly within the current testing framework. If it is acceptable that it is untested I am fine with that (assuming the rest of the change is green in CI). If you feel we should test that I'd love a pointer as to how we might do that.

I have wrapped the debug macro code in a variable condition the same way verbose operated before to add some debugging steps here. I am happy to pull that all back or create a separate PR for it. The debug macro is not in use elsewhere so this felt reasonable for now but I'm happy to trim this down to the one-liner chage.

Steps to reproduce

  • Build a new dbt project wired up to a local database via dbt init ... and setup the config/profiles
  • Setup dbt-unit-testing with the following package file. Uncomment the upstream and run dbt deps to see the existing behavior. Uncomment the fork/revision version to see the new behavior
---
packages:
  # - git: "https://github.com/EqualExperts/dbt-unit-testing"
  #   revision: v0.2.7
  - git: "https://github.com/statype/dbt-unit-testing.git"
    revision: db2d98a3244cf1b7172873c914a3ef0c33018c02
  • Create the following models in models/example/:

in1.sql

    select 1 as id

intermediate_model.sql

select * from {{ ref('in1') }}

passthrough.sql

-- this reference to `intermediate_model` triggers the recursive step
select * from {{ ref('intermediate_model')}}

final.sql

-- this is the model under test
select * from {{ ref('passthrough') }}
  • Create the following test in tests/unit/:

final-mock-intermediate.sql

{{
    config(
        tags=['unit-test']
    )
}}

{% call dbt_unit_testing.test ('final','mock intermediate model') %}
  {% call dbt_unit_testing.mock_ref('intermediate_model', {"input_format": "csv"}) %}
    id
    10
    11
    12
  {% endcall %}

  {% call dbt_unit_testing.expect({"input_format": "csv"}) %}
    id
    10
    11
    12
  {% endcall %}
{% endcall %}

Results

Before

Before this change the in1 initial model is referenced and resolved in the generated test query as generated with the command
$ dbt test --vars '{"verbose": "true"}' -s final-mock-intermediate

with expectations as (
      select count(1) as count, "id" from (select 10 as id
 union all
select 11 as id
 union all
select 12 as id) as s group by "id"

    ),
    actual as (
      select count(1) as count, "id" from (
      with
      "in1" as (select 1 as id
),
"intermediate_model" as (select 10 as id
 union all
select 11 as id
 union all
select 12 as id),
"passthrough" as (select * from "intermediate_model"
)

    select * from (select * from "passthrough"
 ) as t ) as s group by "id"

    ),

    extra_entries as (
    select '+' as diff, count, "id" from actual


    except


    select '+' as diff, count, "id" from expectations),

    missing_entries as (
    select '-' as diff, count, "id" from expectations


    except


    select '-' as diff, count, "id" from actual)

    select * from extra_entries
    UNION ALL
    select * from missing_entries

After

With this change, which provides the list of mocks to short-circuit recursion, CTE resolution stops at intermediate_model as it is mocked in the test.

with expectations as (
      select count(1) as count, "id" from (select 10 as id
 union all
select 11 as id
 union all
select 12 as id) as s group by "id"

    ),
    actual as (
      select count(1) as count, "id" from (
      with
      "intermediate_model" as (select 10 as id
 union all
select 11 as id
 union all
select 12 as id),
"passthrough" as (select * from "intermediate_model"
)

    select * from (select * from "passthrough"
 ) as t ) as s group by "id"

    ),

    extra_entries as (
    select '+' as diff, count, "id" from actual


    except


    select '+' as diff, count, "id" from expectations),

    missing_entries as (
    select '-' as diff, count, "id" from expectations


    except


    select '-' as diff, count, "id" from actual)

    select * from extra_entries
    UNION ALL
    select * from missing_entries

@daveworth daveworth marked this pull request as ready for review June 14, 2023 20:44
This allows the dependency detection to short-circuit mocking CTEs
when appropriate
Copy link
Collaborator

@psousa50 psousa50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@psousa50 psousa50 added the safe to test Label used to identify prs safe to test on the integration environments label Jun 17, 2023
@psousa50 psousa50 temporarily deployed to test June 17, 2023 18:01 — with GitHub Actions Inactive
@psousa50 psousa50 temporarily deployed to test June 17, 2023 18:01 — with GitHub Actions Inactive
@psousa50 psousa50 temporarily deployed to test June 17, 2023 18:01 — with GitHub Actions Inactive
@psousa50 psousa50 temporarily deployed to test June 17, 2023 18:01 — with GitHub Actions Inactive
@psousa50 psousa50 temporarily deployed to test June 17, 2023 18:05 — with GitHub Actions Inactive
@github-actions github-actions bot removed the safe to test Label used to identify prs safe to test on the integration environments label Jun 17, 2023
@psousa50 psousa50 merged commit 405b60b into EqualExperts:master Jun 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants