Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test for sequential values on a column #306

Closed
yu-iskw opened this issue Dec 15, 2020 · 7 comments
Closed

Add a test for sequential values on a column #306

yu-iskw opened this issue Dec 15, 2020 · 7 comments
Labels
enhancement New feature or request triage

Comments

@yu-iskw
Copy link

yu-iskw commented Dec 15, 2020

Describe the feature

For instance, we incrementally update a table everyday, we want to make sure if missing dates exist or not.
We can implement a test to make sure if distinct values of a column is sequence without missing ones.

When the updated_date column should have sequential dates without missing ones from 2020-01-01 to 2020-12-31, we can declare the test as below:

columns:
  - name: updated_date
    tests:
      - dbt_utils.sequencial_values:
          period: day  # ["minute", "hour", "day", "week", "month", "year", "integer"]
          gaps: 1  # Optional, default 1
          where: "updated_date NOT IN ('2020-05-01', '2020-09-04')"
  • period: (Required) duration of sequential values, say about 1 day
  • gaps: (Optional) gaps of the sequence, default: 1
  • where: (Optional) conditions to accept missing values

The genrated query for the test in BigQuery can be like:

WITH unique_values (
  SELECT
    DISTINCT `updated_date` AS v
  FROM `project`.`dataset`.`table`
  ORDER BY 1
)

, gaps_to_subsequent AS (
  SELECT
    DATE_DIFF(LEAD(v), v) AS gaps
  FROM unique_values
)

SELECT COUNT(*)
FROM gaps_to_subsequent
WHERE 
  gaps != 1
  AND updated_date NOT IN ('2020-05-01', '2020-09-04')

Describe alternatives you've considered

We can of course implement a dbt data test by hand. But, such a test is repeated in different tables.

Additional context

I don't think it is database-specific.

Are you interested in contributing this feature?

I want to implement the feature.

@yu-iskw yu-iskw added enhancement New feature or request triage labels Dec 15, 2020
@clrcrl
Copy link
Contributor

clrcrl commented Dec 15, 2020

Hey there @yu-iskw! This is a cool test, I would definitely support it in dbt-utils. I'm doing a big cleanup of open PRs this week, do you mind holding off on writing the code on this for a week or so?

@yu-iskw
Copy link
Author

yu-iskw commented Dec 16, 2020

@clrcrl Good to know. Sure things!

@yu-iskw
Copy link
Author

yu-iskw commented May 6, 2021

@clrcrl Please let me know if we are ready to start implementation.

@yu-iskw
Copy link
Author

yu-iskw commented Aug 11, 2021

I will work on it shortly.

@yu-iskw
Copy link
Author

yu-iskw commented Aug 17, 2021

This was already implemented. But, I think it would be great to support the where clause so that we sometimes want to have conditions to exclude the test scope.
#327

@yu-iskw
Copy link
Author

yu-iskw commented Aug 17, 2021

@clrcrl Can we close the issue, although we can support whare as well?

@yu-iskw
Copy link
Author

yu-iskw commented Aug 17, 2021

Additional PR
#401

@yu-iskw yu-iskw closed this as completed Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

2 participants