Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use daff for diff formatting in unit testing #8984

Merged

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Nov 2, 2023

resolves #8558

Problem

The diff is pretty lackluster for unit tests when they fail!

Solution

  • Use daff to determine and render the diff between actual and expected
  • strip out colors from daff rendering if NO_COLORS option is specified
  • update the UnitTestDiff in the result to include more structure - the actual json rows, expected json rows, and rendered diff.

Note: There are quite a few moving pieces on this branch right now. I think we should add tests for unit test artifacts but it may make more sense to do once they are integrated into build/test commands since that will affect the results.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

🎩

03:20:50  Completed with 1 error and 0 warnings:
03:20:50  
03:20:50  Failure in unit_test my_model__test_my_model_string_concat (models/unit_testing/schema.yml)
03:20:50    

expected differs from actual:

@@ ,string_c
+++,ab
---,abc
Screenshot 2023-11-01 at 11 46 25 PM Screenshot 2023-11-01 at 11 47 10 PM

@cla-bot cla-bot bot added the cla:yes label Nov 2, 2023
Copy link
Contributor

github-actions bot commented Nov 2, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Nov 2, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (436dae6) 86.80% compared to head (b2c9035) 86.78%.

Files Patch % Lines
core/dbt/task/unit_test.py 97.56% 1 Missing ⚠️
Additional details and impacted files
@@                       Coverage Diff                       @@
##           unit_testing_feature_branch    #8984      +/-   ##
===============================================================
- Coverage                        86.80%   86.78%   -0.03%     
===============================================================
  Files                              181      181              
  Lines                            27057    27089      +32     
===============================================================
+ Hits                             23488    23508      +20     
- Misses                            3569     3581      +12     
Flag Coverage Δ
integration 83.73% <98.03%> (-0.09%) ⬇️
unit 64.56% <37.25%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dbeatty10
Copy link
Contributor

@MichelleArk This is super useful to compare and contrast some of our formatting options!

Summary

When kooking at a larger unit test case, it's possible to pick out the differences when using a colorized daff (data diff).

But when looking at a naive tabular output, I wasn't able to discern the differences between the actual and expected.

Non-colorized daff is much harder to see than a colorized daff. But it is still easier that the naive tabular output due to:

  • the side-by-side nature
  • reduced number of rows
  • "→" to signal where to look

Our most readable formats may have similar characteristics as well.

See below for more details.

Crux of the issue

The crux of the issue is being able to determine two things:

  1. Did the unit tests all pass or not?
  2. If one or more didn't pass, what was the difference between the actual output and the expected output?

Relevant factors

It will be interesting to see how the following affects our ability to put our finger on what the actual diffs are:

  • number of columns
  • number of rows
  • data types
  • length of the values
  • etc.

2 columns by 10 rows; 2 differences

Here's a "bigger" example to compare/contrast with. Would be useful if we can up up with even more representative examples.

Project setup

  1. git clone https://github.com/dbt-labs/dbt-core.git
  2. cd dbt-core
  3. git checkout spike/daff-formatting-unit-testing
  4. Follow the contributing instructions:
    1. Create and activate a virtual environment
    2. pip install -r dev-requirements.txt -r editable-requirements.txt
    3. etc.
  5. Create a dbt project (i.e., create a dbt_project.yml file that points to a database connection profile)

Files

actual.csv

uuid4_1,uuid4_2
bace1187-8b38-4b35-90b0-a4399ab62891,c4cef639-49ad-4ec8-8d65-e45acb2d2e6b
b8f759ab-83e5-4b46-ba6c-26e0e8a65ce9,ea1ce868-e9be-4099-b6b0-6e080f16c8fa
40fa0f5c-e5d3-4e88-bd73-bacc2496a346,47ca2417-91e5-49cd-b82d-8398a513ff61
4769f4f0-51a0-4b7f-aa3a-d7547e131011,6ce2c256-d85d-4d6c-a7a5-07e5f82ae4be
89a94808-88d7-4fdf-85d4-d19b0cbb5193,c5b1419c-a2d0-4358-92a4-df05cb82c989
c97fc0ce-6dd5-40c5-8fc0-459be00cb2a1,fc5eab42-460b-4970-a28d-e15d8e79eeda
3cd99b8c-0829-4a26-a854-1938944606af,9266c041-e81b-40bd-9cbb-b0bbfe3a4450
b883eba0-867f-4f03-88f0-e20adea4dead,39f8c21b-788f-4612-a7b0-3aaa887f658c
bc3ae058-b1e6-46fa-9706-002e341658ab,502e6acd-1b41-40ed-afa0-5488b2492ffc
b9e0d99c-c7c5-4663-a75e-2774ebe69e7b,a4fad314-3afa-4cf5-8382-830fc70cc44f

expected.csv

uuid4_1,uuid4_2
bace1187-8b38-4b35-90b0-a4399ab62891,c4cef639-49ad-4ec8-8d65-e45acb2d2e6b
b8f759ab-83e5-4b46-ba6c-26e0e8a65ce9,ea1ce868-e9be-4099-b6b0-6e080f16c8fa
40fa0f5c-e5d3-4e88-bd73-bacc2496a346,47ca2417-91e5-49cd-b82d-8398a513ff61
4769f4f0-51a0-4b7f-aa3a-d7547e131011,6ce2c256-d85d-4d6c-a7a5-07e5f82ae4be
89a94808-88d7-4fdf-85d4-d19b0cbb5193,c5b1419c-a2d0-4358-92a4-df05cb82c989
c97fc0ce-6dd5-40c5-8fc0-459be00cb2a1,934bf0f9-8f74-47b5-8ac5-7bb7c750e498
3cd99b8c-0829-4a26-a854-1938944606af,f7ff7645-2add-46ea-bcb6-8ca1301a34a1
b883eba0-867f-4f03-88f0-e20adea4dead,39f8c21b-788f-4612-a7b0-3aaa887f658c
bc3ae058-b1e6-46fa-9706-002e341658ab,502e6acd-1b41-40ed-afa0-5488b2492ffc
b9e0d99c-c7c5-4663-a75e-2774ebe69e7b,a4fad314-3afa-4cf5-8382-830fc70cc44f

models/my_model.sql

select 'bace1187-8b38-4b35-90b0-a4399ab62891' as uuid4_1, 'c4cef639-49ad-4ec8-8d65-e45acb2d2e6b' as uuid4_2 union all
select 'b8f759ab-83e5-4b46-ba6c-26e0e8a65ce9' as uuid4_1, 'ea1ce868-e9be-4099-b6b0-6e080f16c8fa' as uuid4_2 union all
select '40fa0f5c-e5d3-4e88-bd73-bacc2496a346' as uuid4_1, '47ca2417-91e5-49cd-b82d-8398a513ff61' as uuid4_2 union all
select '4769f4f0-51a0-4b7f-aa3a-d7547e131011' as uuid4_1, '6ce2c256-d85d-4d6c-a7a5-07e5f82ae4be' as uuid4_2 union all
select '89a94808-88d7-4fdf-85d4-d19b0cbb5193' as uuid4_1, 'c5b1419c-a2d0-4358-92a4-df05cb82c989' as uuid4_2 union all
select 'c97fc0ce-6dd5-40c5-8fc0-459be00cb2a1' as uuid4_1, 'fc5eab42-460b-4970-a28d-e15d8e79eeda' as uuid4_2 union all
select '3cd99b8c-0829-4a26-a854-1938944606af' as uuid4_1, '9266c041-e81b-40bd-9cbb-b0bbfe3a4450' as uuid4_2 union all
select 'b883eba0-867f-4f03-88f0-e20adea4dead' as uuid4_1, '39f8c21b-788f-4612-a7b0-3aaa887f658c' as uuid4_2 union all
select 'bc3ae058-b1e6-46fa-9706-002e341658ab' as uuid4_1, '502e6acd-1b41-40ed-afa0-5488b2492ffc' as uuid4_2 union all
select 'b9e0d99c-c7c5-4663-a75e-2774ebe69e7b' as uuid4_1, 'a4fad314-3afa-4cf5-8382-830fc70cc44f' as uuid4_2

models/_unit.yml

unit:
    - model: my_model
      tests:
        - name: test_my_model
          description: "unit test description"
          given: []
          expect:
            rows:
              - {uuid4_1: bace1187-8b38-4b35-90b0-a4399ab62891, uuid4_2: c4cef639-49ad-4ec8-8d65-e45acb2d2e6b}
              - {uuid4_1: b8f759ab-83e5-4b46-ba6c-26e0e8a65ce9, uuid4_2: ea1ce868-e9be-4099-b6b0-6e080f16c8fa}
              - {uuid4_1: 40fa0f5c-e5d3-4e88-bd73-bacc2496a346, uuid4_2: 47ca2417-91e5-49cd-b82d-8398a513ff61}
              - {uuid4_1: 4769f4f0-51a0-4b7f-aa3a-d7547e131011, uuid4_2: 6ce2c256-d85d-4d6c-a7a5-07e5f82ae4be}
              - {uuid4_1: 89a94808-88d7-4fdf-85d4-d19b0cbb5193, uuid4_2: c5b1419c-a2d0-4358-92a4-df05cb82c989}
              - {uuid4_1: c97fc0ce-6dd5-40c5-8fc0-459be00cb2a1, uuid4_2: 934bf0f9-8f74-47b5-8ac5-7bb7c750e498}
              - {uuid4_1: 3cd99b8c-0829-4a26-a854-1938944606af, uuid4_2: f7ff7645-2add-46ea-bcb6-8ca1301a34a1}
              - {uuid4_1: b883eba0-867f-4f03-88f0-e20adea4dead, uuid4_2: 39f8c21b-788f-4612-a7b0-3aaa887f658c}
              - {uuid4_1: bc3ae058-b1e6-46fa-9706-002e341658ab, uuid4_2: 502e6acd-1b41-40ed-afa0-5488b2492ffc}
              - {uuid4_1: b9e0d99c-c7c5-4663-a75e-2774ebe69e7b, uuid4_2: a4fad314-3afa-4cf5-8382-830fc70cc44f}              
dbt run -s my_model
dbt unit-test

Option 1 - naive tabular

git checkout unit_testing_feature_branch
dbt unit-test
image

Option 2 - daff (w/ colorization)

git checkout spike/daff-formatting-unit-testing
dbt unit-test
image

Option 3 - daff (w/o colorization)

git checkout spike/daff-formatting-unit-testing
dbt --no-use-colors unit-test
image

@MichelleArk MichelleArk changed the title spike using daff for diff formatting in unit testing [spike] using daff for diff formatting in unit testing Nov 7, 2023
@MichelleArk MichelleArk self-assigned this Nov 7, 2023
@MichelleArk
Copy link
Contributor Author

MichelleArk commented Nov 10, 2023

Overall I think the daff-based diff is a definite improvement over the naive tabular diff that's current implemented. It's much clearer to see what the diff actually is - especially with coloring :) My only concern is that the daff package has not had a python release in a while (last release was in 2020).

The project itself seems to be active and supports many languages, and there is an issue open to release a more recent python package. It looks relatively stable though, only a handful of patch releases have gone out since aug 2020 for other supported languages.

Otherwise - the stdout diff feels like an interface we can continue to change if we wanted to between minor versions. What's important is to standardize on a diff output in the json artifact that can be stable and consistent between different visualization implementations for the initial release though.

cc @gshank - how does that sound to you in terms of proceeding here?

@MichelleArk MichelleArk changed the title [spike] using daff for diff formatting in unit testing Use daff for diff formatting in unit testing Nov 14, 2023
@MichelleArk MichelleArk marked this pull request as ready for review November 14, 2023 15:51
@MichelleArk MichelleArk requested a review from a team as a code owner November 14, 2023 15:51
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! And it has the side benefit of giving us a reason to have variables and methods named "daff_diff" :-)

@MichelleArk
Copy link
Contributor Author

Sneak peek of diff visualization for array types! (on bigquery, not quite working with postgres yet...)

Screenshot 2023-11-15 at 11 27 09 AM

@MichelleArk MichelleArk merged commit 35f579e into unit_testing_feature_branch Nov 15, 2023
48 checks passed
@MichelleArk MichelleArk deleted the spike/daff-formatting-unit-testing branch November 15, 2023 16:27
gshank added a commit that referenced this pull request Jan 16, 2024
* Initial implementation of unit testing (from pr #2911)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* 8295 unit testing artifacts (#8477)

* unit test config: tags & meta (#8565)

* Add additional functional test for unit testing selection, artifacts, etc (#8639)

* Enable inline csv format in unit testing (#8743)

* Support unit testing incremental models (#8891)

* update unit test key: unit -> unit-tests (#8988)


* convert to use unit test name at top level key (#8966)

* csv file fixtures (#9044)

* Unit test support for `state:modified` and `--defer` (#9032)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* Allow use of sources as unit testing inputs (#9059)

* Use daff for diff formatting in unit testing (#8984)

* Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064)

Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Fix #8652: Use seed value if rows not specified

* Move unit testing to test and build commands (#9108)

* Enable unit testing in non-root packages (#9184)

* convert test to data_test (#9201)

* Make fixtures files full-fledged members of manifest and enable partial parsing (#9225)

* In build command run unit tests before models (#9273)

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants