Investigate mocking recursing further than required #144

daveworth · 2023-06-12T22:08:04Z

Summary

The dbt-unit-testing SQL Builder macro build_model_complete_sql contains the build_model_dependencies sub-macro which recursively traverses the DAG in a depth-first fashion to gather the list of nodes of the DAG which must be synthesized. It takes a list of mocked nodes as a second parameter and short-circuits the recursion if the current node is mocked. This list of mocked nodes is passed to the first invocation of build_model_dependencies but is not passed in the subsequent calls meaning that in the case where the mocked model or source is not referenced directly from the model under test all dependencies are mocked.

Existing Behavior in sql_builders.sql

{% macro build_model_dependencies(node, models_to_exclude, build_full_lineage=True) %}
    {# snip ... #}
    {% set child_model_dependencies = dbt_unit_testing.build_model_dependencies(node) %}
    {# snip ... #}
{% endmacro %}

Desired Behavior in sql_builders.sql

{% macro build_model_dependencies(node, models_to_exclude, build_full_lineage=True) %}
    {# snip ... #}
    {% set child_model_dependencies = dbt_unit_testing.build_model_dependencies(node, models_to_exclude, build_full_lineage) %}
    {# snip ... #}
{% endmacro %}

Impact

Under ordinary circumstances this generates a larger test-query than is strictly needed but does not have other negative impact. In our dbt environment we use the dbt-fal adapter to bring our own runtime for dbt-python models. dbt-unit-testing reads the contents of models inject into the mocked CTEs (while resolving source/ref call to allow this recursively). When the recursive step reads a python model the resulting CTE is invalid SQL (it's Python code!) and the test fails.

Additional Details

It is unclear to me how to test this directly within the current testing framework. If it is acceptable that it is untested I am fine with that (assuming the rest of the change is green in CI). If you feel we should test that I'd love a pointer as to how we might do that.

I have wrapped the debug macro code in a variable condition the same way verbose operated before to add some debugging steps here. I am happy to pull that all back or create a separate PR for it. The debug macro is not in use elsewhere so this felt reasonable for now but I'm happy to trim this down to the one-liner chage.

Steps to reproduce

Build a new dbt project wired up to a local database via dbt init ... and setup the config/profiles
Setup dbt-unit-testing with the following package file. Uncomment the upstream and run dbt deps to see the existing behavior. Uncomment the fork/revision version to see the new behavior

---
packages:
  # - git: "https://github.com/EqualExperts/dbt-unit-testing"
  #   revision: v0.2.7
  - git: "https://github.com/statype/dbt-unit-testing.git"
    revision: db2d98a3244cf1b7172873c914a3ef0c33018c02

Create the following models in models/example/:

`in1.sql`

    select 1 as id

`intermediate_model.sql`

select * from {{ ref('in1') }}

`passthrough.sql`

-- this reference to `intermediate_model` triggers the recursive step
select * from {{ ref('intermediate_model')}}

`final.sql`

-- this is the model under test
select * from {{ ref('passthrough') }}

Create the following test in tests/unit/:

final-mock-intermediate.sql

{{
    config(
        tags=['unit-test']
    )
}}

{% call dbt_unit_testing.test ('final','mock intermediate model') %}
  {% call dbt_unit_testing.mock_ref('intermediate_model', {"input_format": "csv"}) %}
    id
    10
    11
    12
  {% endcall %}

  {% call dbt_unit_testing.expect({"input_format": "csv"}) %}
    id
    10
    11
    12
  {% endcall %}
{% endcall %}

Results

Before

Before this change the in1 initial model is referenced and resolved in the generated test query as generated with the command
$ dbt test --vars '{"verbose": "true"}' -s final-mock-intermediate

with expectations as (
      select count(1) as count, "id" from (select 10 as id
 union all
select 11 as id
 union all
select 12 as id) as s group by "id"

    ),
    actual as (
      select count(1) as count, "id" from (
      with
      "in1" as (select 1 as id
),
"intermediate_model" as (select 10 as id
 union all
select 11 as id
 union all
select 12 as id),
"passthrough" as (select * from "intermediate_model"
)

    select * from (select * from "passthrough"
 ) as t ) as s group by "id"

    ),

    extra_entries as (
    select '+' as diff, count, "id" from actual


    except


    select '+' as diff, count, "id" from expectations),

    missing_entries as (
    select '-' as diff, count, "id" from expectations


    except


    select '-' as diff, count, "id" from actual)

    select * from extra_entries
    UNION ALL
    select * from missing_entries

After

With this change, which provides the list of mocks to short-circuit recursion, CTE resolution stops at intermediate_model as it is mocked in the test.

with expectations as (
      select count(1) as count, "id" from (select 10 as id
 union all
select 11 as id
 union all
select 12 as id) as s group by "id"

    ),
    actual as (
      select count(1) as count, "id" from (
      with
      "intermediate_model" as (select 10 as id
 union all
select 11 as id
 union all
select 12 as id),
"passthrough" as (select * from "intermediate_model"
)

    select * from (select * from "passthrough"
 ) as t ) as s group by "id"

    ),

    extra_entries as (
    select '+' as diff, count, "id" from actual


    except


    select '+' as diff, count, "id" from expectations),

    missing_entries as (
    select '-' as diff, count, "id" from expectations


    except


    select '-' as diff, count, "id" from actual)

    select * from extra_entries
    UNION ALL
    select * from missing_entries

This allows the dependency detection to short-circuit mocking CTEs when appropriate

psousa50

👍

daveworth marked this pull request as ready for review June 14, 2023 20:44

daveworth mentioned this pull request Jun 14, 2023

Skip the middle table or view to get test expectations #114

Closed

include models_to_exclude in build_model_dependencies recursion

fa7d70a

This allows the dependency detection to short-circuit mocking CTEs when appropriate

psousa50 approved these changes Jun 17, 2023

View reviewed changes

psousa50 added the safe to test Label used to identify prs safe to test on the integration environments label Jun 17, 2023

psousa50 temporarily deployed to test June 17, 2023 18:01 — with GitHub Actions Inactive

psousa50 temporarily deployed to test June 17, 2023 18:05 — with GitHub Actions Inactive

github-actions bot removed the safe to test Label used to identify prs safe to test on the integration environments label Jun 17, 2023

psousa50 merged commit 405b60b into EqualExperts:master Jun 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate mocking recursing further than required #144

Investigate mocking recursing further than required #144

daveworth commented Jun 12, 2023 •

edited

Loading

psousa50 left a comment

Investigate mocking recursing further than required #144

Investigate mocking recursing further than required #144

Conversation

daveworth commented Jun 12, 2023 • edited Loading

Summary

Impact

Additional Details

Steps to reproduce

in1.sql

intermediate_model.sql

passthrough.sql

final.sql

Results

Before

After

psousa50 left a comment

Choose a reason for hiding this comment

daveworth commented Jun 12, 2023 •

edited

Loading

`in1.sql`

`intermediate_model.sql`

`passthrough.sql`

`final.sql`