Skip to content

Commit

Permalink
Small improvements in examples
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrzej Nagalski committed Dec 1, 2023
1 parent 40b6d2c commit d59c12c
Show file tree
Hide file tree
Showing 18 changed files with 59 additions and 57 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ To execute the check prepared in the example using the [user interface](../../dq
The results should be similar to the one below.

The actual value in this example is 100, which is above the minimum threshold level set in the warning (99.0%).
The check gives a valid result (notice the green square on the left of the name of the check).
The check gives a valid result (notice the green square to the left of the check name).

![Foreign-key-match-percent check results](https://dqops.com/docs/images/examples/daily-foreign-key-match-percent-checks-results.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/data-completeness/number-of-null-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value of null values in this example is 8, which is above the maximum threshold level set in the warning (5).
The check gives a warning result (notice the yellow square on the left of the name of the check).
The check gives a warning result (notice the yellow square to the left of the check name).

![Null-count check results](https://dqops.com/docs/images/examples/daily-null-count-check-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ The following is a fragment of the `bigquery-public-data.america_health_rankings
Review the results which should be similar to the one below.

The actual value of rows in this example is 18155, which is above the minimum threshold level set in the warning (1).
The check gives a valid result (notice the green square on the left of the name of the check).
The check gives a valid result (notice the green square to the left of the check name).
Now you can be sure that you table is not empty.

![Row-count check result](https://dqops.com/docs/images/examples/daily-row-count-check-result.png)
Expand Down
58 changes: 30 additions & 28 deletions docs/examples/data-consistency/percent-of-string-in-set.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,18 @@
# Percent of string in set
# Percent of rows with string values in set

Verifies that the percentage of strings from a set in a column does not fall below the minimum accepted percentage.
Verifies that the percentage of string values from a set in a column does not fall below the minimum accepted percentage.

**PROBLEM**

We will be testing [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance) dataset.
This data approach student achievement in secondary education of two Portuguese schools.
The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires.
The data attributes include student grades, demographic, social and school related features and it was collected by using school reports and questionnaires.
Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por).
In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks.
Important note: the target attribute G3 has a strong correlation with attributes G2 and G1.

We are verifying if values in the tested column `Fjob` are one of accepted values.

## Data structure

The following is a fragment of the [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance).
Some columns were omitted for clarity.
The `Fjob` column of interest contains father job's values.

| Medu | Fedu | Mjob | Fjob | reason | guardian | traveltime | studytime |
|:-----|:-----|:---------|:-------------|:-----------|:---------|:-----------|:----------|
| 4 | 4 | services | **at_home** | course | mother | 1 | 3 |
| 3 | 3 | other | **other** | course | other | 2 | 1 |
| 4 | 3 | teacher | **services** | course | father | 2 | 4 |
| 3 | 2 | health | **services** | home | father | 1 | 2 |
| 4 | 4 | teacher | **teacher** | course | mother | 1 | 1 |
| 3 | 2 | services | **at_home** | home | mother | 1 | 1 |
| 2 | 2 | other | **other** | home | mother | 1 | 2 |
| 1 | 3 | at_home | **services** | home | mother | 1 | 2 |
| 1 | 1 | at_home | **other** | reputation | mother | 1 | 3 |
| 4 | 3 | teacher | **other** | course | mother | 1 | 1 |
| 2 | 1 | other | **other** | course | other | 2 | 3 |
| 2 | 2 | services | **services** | course | father | 1 | 4 |
| 2 | 2 | at_home | **services** | home | mother | 1 | 3 |
| 3 | 3 | services | **services** | home | mother | 1 | 2 |
| 2 | 2 | other | **other** | home | other | 1 | 2 |

**SOLUTION**

We will verify the data using profiling [daily_string_value_in_set_percent](../../checks/column/strings/string-value-in-set-percent.md) column check.
Expand All @@ -54,6 +30,7 @@ SELECT
) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`kaggle_student_performance`.`maths` AS analyzed_table
```

In this example, we will set three minimum percent thresholds levels for the check (a minimum accepted percentage of valid rows):

- warning: 99
Expand All @@ -66,6 +43,31 @@ If you want to learn more about checks and threshold levels, please refer to the

If the percent of string from a set values fall below 99, a warning alert will be triggered.

## Data structure

The following is a fragment of the [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance).
Some columns were omitted for clarity.
The `Fjob` column of interest contains father job's values.

| Medu | Fedu | Mjob | Fjob | reason | guardian | traveltime | studytime |
|:-----|:-----|:---------|:-------------|:-----------|:---------|:-----------|:----------|
| 4 | 4 | services | **at_home** | course | mother | 1 | 3 |
| 3 | 3 | other | **other** | course | other | 2 | 1 |
| 4 | 3 | teacher | **services** | course | father | 2 | 4 |
| 3 | 2 | health | **services** | home | father | 1 | 2 |
| 4 | 4 | teacher | **teacher** | course | mother | 1 | 1 |
| 3 | 2 | services | **at_home** | home | mother | 1 | 1 |
| 2 | 2 | other | **other** | home | mother | 1 | 2 |
| 1 | 3 | at_home | **services** | home | mother | 1 | 2 |
| 1 | 1 | at_home | **other** | reputation | mother | 1 | 3 |
| 4 | 3 | teacher | **other** | course | mother | 1 | 1 |
| 2 | 1 | other | **other** | course | other | 2 | 3 |
| 2 | 2 | services | **services** | course | father | 1 | 4 |
| 2 | 2 | at_home | **services** | home | mother | 1 | 3 |
| 3 | 3 | services | **services** | home | mother | 1 | 2 |
| 2 | 2 | other | **other** | home | other | 1 | 2 |


## Running the checks in the example and evaluating the results using the user interface

A detailed explanation of [how to run the example is described here](../../#running-the-use-cases).
Expand Down Expand Up @@ -108,7 +110,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 40, which is below the minimum threshold level set in the warning (99).
The check gives a fatal result (notice the red square on the left of the name of the check).
The check gives a fatal result (notice the red square to the left of the check name).

![String-in-set-percent check results](https://dqops.com/docs/images/examples/daily-string-in-set-percent-check-results.png)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Percentage of false values

This example shows how to detect that the percentage of false values remains above a set threshold.
This example shows how to detect that the percentage of false boolean values remains above a set threshold.

**PROBLEM**

Expand All @@ -15,14 +15,14 @@ a link to a copy of the PDF, which can be found on Google Cloud Storage.
The `invalidOcr` column indicates if the OCR does match the raw file text (false value) or does not (true value). In case
of the true value, the OCR process needs more work and the file is not ready to be transcribed.

We want to verify the percentage of false values in the `invalidOcr` column, which will tell us what percentage of data is
We want to verify the percentage of false boolean values in the `invalidOcr` column, which will tell us what percentage of data is
ready to be transcribed.

**SOLUTION**

We will verify the data of `bigquery-public-data.fcc_political_ads.content_info` using monitoring
[false_percent](../../checks/column/bool/false-percent.md) column check.
Our goal is to verify that the percentage of false values on `invalidOcr` column does not fall below 99%.
Our goal is to verify that the percentage of false boolean values on `invalidOcr` column does not fall below 99%.

In this example, we will set three minimum percentage thresholds levels for the check:

Expand Down Expand Up @@ -96,7 +96,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 99, which is above the minimum threshold level set in the warning (99).
The check gives a valid result (notice the green square on the left of the name of the check).
The check gives a valid result (notice the green square to the left of the check name).

![False-percent check results](https://dqops.com/docs/images/examples/daily-false-percent-check-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ If you want to learn more about checks and threshold levels, please refer to the

**VALUE**

If the percentage of valid values falls below 5.0%, a warning alert will be triggered.
If the percentage of valid values falls below 5.0%, an error alert will be triggered.

## Data structure

Expand Down Expand Up @@ -90,8 +90,8 @@ To execute the check prepared in the example using the [user interface](../../dq

Review the results which should be similar to the one below.

The actual value in this example is 92, which is below the minimum threshold level set in the warning (99.0%).
The check gives a warning (notice the orange square on the left of the name of the check).
The actual value in this example is 92, which is below the minimum threshold level set in the error (95.0%).
The check raises an error issue (notice the orange square to the left of the check name).

![Values-in-range-numeric-percent check results](https://dqops.com/docs/images/examples/daily-values-in-range-numeric-percent-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# A string not exceeding a set length

The check verifies that the length of the string does not exceed the indicated value.
This example shows how to verify that the maximal length of the string in a column does not exceed the set length.

**PROBLEM**

Expand All @@ -10,23 +10,23 @@ The platform analyzes more than 340 measures of behaviors, social and economic f
Data is based on public-use data sets, such as the U.S. Census and the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS),
the world’s largest, annual population-based telephone survey of over 400,000 people.

The `measure_name` contains measure name data. We want to verify that the length of the string values in this column does not exceed 30.
The `measure_name` contains measure name data. We want to verify that the length of the string values in this column does not exceed 30 characters.

**SOLUTION**

We will verify the data of `bigquery-public-data.america_health_rankings.ahr` using monitoring
[string_max_length](../../checks/column/strings/string-max-length.md) column check.
Our goal is to verify if the number of valid length values on `measure_name` column does not exceed the setup thresholds.
Our goal is to verify if the length of the strings in `measure_name` column does not exceed the set threshold.

In this example, we will set one maximum thresholds level for the check:

- warning: 30.0
- error: 30.0

If you want to learn more about checks and threshold levels, please refer to the [DQOps concept section](../../dqo-concepts/checks/index.md).

**VALUE**

If the string length exceed 30.0, a warning alert will be triggered.
If the string length exceed 30.0, en error alert will be triggered.

## Data structure

Expand Down Expand Up @@ -86,8 +86,8 @@ To execute the check prepared in the example using the [user interface](../../dq

Review the results which should be similar to the one below.

The actual value in this example is 31, which is above the maximum threshold level set in the error (30).
The check gives an error(notice the orange square on the left of the name of the check).
The actual value in this example is 31, which is above the maximum threshold level set in the error field (30).
The check result in an error issue (notice the orange square to the left of the check name).

![String-max-length check results](https://dqops.com/docs/images/examples/daily-string-max-length-checks-results.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/data-uniqueness/percentage-of-duplicates.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 0, which is below the maximum threshold level set in the warning (1.0%).
The check gives a valid result (notice the green square on the left of the name of the check).
The check gives a valid result (notice the green square to the left of the check name).

![Duplicate-percent check results](https://dqops.com/docs/images/examples/daily-duplicate-percent-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 5, which is above the maximum threshold level set in the warning (0).
The check gives a warning (notice the yellow square on the left of the name of the check).
The check gives a warning (notice the yellow square to the left of the check name).

![String-invalid-ip4-address-count check results](https://dqops.com/docs/images/examples/daily-string-invalid-ip4-address-count-checks-results.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/data-validity/number-of-invalid-emails.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 22, which is above the maximum threshold level set in the warning (0).
The check gives a fatal error (notice the red square on the left of the name of the check).
The check gives a fatal error (notice the red square to the left of the check name).

![String-invalid-email-count check results](https://dqops.com/docs/images/examples/daily-string-invalid-email-count-check-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 48, which is above the maximum threshold level set in the warning (45.0%).
The check gives a warning result (notice the yellow square on the left of the name of the check).
The check gives a warning result (notice the yellow square to the left of the check name).

![Negative-percent check results](https://dqops.com/docs/images/examples/daily-negative-percent-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 92, which is below the minimum threshold level set in the warning (100.0%).
The check gives a fatal error (notice the red square on the left of the name of the check).
The check gives a fatal error (notice the red square to the left of the check name).

![SQL-condition-passed-percent check results](https://dqops.com/docs/images/examples/daily-sql-condition-passed-percent-on-table-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 0, which is below the minimum threshold level set in the warning (99.0%).
The check gives a fatal error (notice the red square on the left of the name of the check).
The check gives a fatal error (notice the red square to the left of the check name).

![String-match-date-regex-percent check results](https://dqops.com/docs/images/examples/daily-string-match-date-regex-percent-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 64, which is below the minimum threshold level set in the warning (75.0%).
The check gives an error result (notice the orange square on the left of the name of the check).
The check gives an error result (notice the orange square to the left of the check name).

![String-valid-currency-code-percent check results](https://dqops.com/docs/images/examples/daily-string-valid-currency-code-percent-checks-results.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 99, which is above the minimum threshold level set in the warning (99.0%).
The check gives a valid result (notice the green square on the left of the name of the check).
The check gives a valid result (notice the green square to the left of the check name).

![Valid-latitude-and-longitude-percent check results](https://dqops.com/docs/images/examples/daily-valid-latitude-and-longitude-percent-checks-results.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/data-validity/percentage-of-valid-uuid.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 75, which is below the minimum threshold level set in the warning (100.0%).
The check gives a fatal error (notice the red square on the left of the name of the check).
The check gives a fatal error (notice the red square to the left of the check name).

![String-valid-uuid-percent check results](https://dqops.com/docs/images/examples/daily-string-valid-uuid-percent-checks-result.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ To execute the check prepared in the example using the [user interface](../../dq
Review the results which should be similar to the one below.

The actual value in this example is 98, which is above the maximum threshold level set in the warning (10.0%).
The check gives a fatal error (notice the red square on the left of the name of the check).
The check gives a fatal error (notice the red square to the left of the check name).

![Contains-usa-zipcode-percent check results](https://dqops.com/docs/images/examples/daily-contains-usa-zipcode-percent-checks-results.png)

Expand Down
Loading

0 comments on commit d59c12c

Please sign in to comment.