From d59c12ce09fdf4f972946d1929af4b8c1d47d6b4 Mon Sep 17 00:00:00 2001 From: Andrzej Nagalski Date: Fri, 1 Dec 2023 14:59:52 +0100 Subject: [PATCH] Small improvements in examples --- ...eck-between-columns-in-different-tables.md | 2 +- .../number-of-null-values.md | 2 +- .../number-of-rows-in-the-table.md | 2 +- .../percent-of-string-in-set.md | 58 ++++++++++--------- .../percentage-of-false-values.md | 8 +-- .../percentage-of-integer-values-in-range.md | 6 +- .../string-not-exceeding-a-set-length.md | 14 ++--- .../percentage-of-duplicates.md | 2 +- .../number-of-invalid-IP4-address.md | 2 +- .../data-validity/number-of-invalid-emails.md | 2 +- .../percentage-of-negative-values.md | 2 +- ...ercentage-of-rows-passing-sql-condition.md | 2 +- ...rcentage-of-strings-matching-date-regex.md | 2 +- .../percentage-of-valid-currency-codes.md | 2 +- ...centage-of-valid-latitude-and-longitude.md | 2 +- .../data-validity/percentage-of-valid-uuid.md | 2 +- ...age-of-values-that-contains-usa-zipcode.md | 2 +- docs/examples/stability/table-availability.md | 4 +- 18 files changed, 59 insertions(+), 57 deletions(-) diff --git a/docs/examples/data-accuracy/integrity-check-between-columns-in-different-tables.md b/docs/examples/data-accuracy/integrity-check-between-columns-in-different-tables.md index d55eddc73e..cc99da15ac 100644 --- a/docs/examples/data-accuracy/integrity-check-between-columns-in-different-tables.md +++ b/docs/examples/data-accuracy/integrity-check-between-columns-in-different-tables.md @@ -112,7 +112,7 @@ To execute the check prepared in the example using the [user interface](../../dq The results should be similar to the one below. The actual value in this example is 100, which is above the minimum threshold level set in the warning (99.0%). - The check gives a valid result (notice the green square on the left of the name of the check). + The check gives a valid result (notice the green square to the left of the check name). ![Foreign-key-match-percent check results](https://dqops.com/docs/images/examples/daily-foreign-key-match-percent-checks-results.png) diff --git a/docs/examples/data-completeness/number-of-null-values.md b/docs/examples/data-completeness/number-of-null-values.md index 780bbfa080..616a1d9865 100644 --- a/docs/examples/data-completeness/number-of-null-values.md +++ b/docs/examples/data-completeness/number-of-null-values.md @@ -90,7 +90,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value of null values in this example is 8, which is above the maximum threshold level set in the warning (5). - The check gives a warning result (notice the yellow square on the left of the name of the check). + The check gives a warning result (notice the yellow square to the left of the check name). ![Null-count check results](https://dqops.com/docs/images/examples/daily-null-count-check-results.png) diff --git a/docs/examples/data-completeness/number-of-rows-in-the-table.md b/docs/examples/data-completeness/number-of-rows-in-the-table.md index 30ca882092..1827ed4209 100644 --- a/docs/examples/data-completeness/number-of-rows-in-the-table.md +++ b/docs/examples/data-completeness/number-of-rows-in-the-table.md @@ -85,7 +85,7 @@ The following is a fragment of the `bigquery-public-data.america_health_rankings Review the results which should be similar to the one below. The actual value of rows in this example is 18155, which is above the minimum threshold level set in the warning (1). - The check gives a valid result (notice the green square on the left of the name of the check). + The check gives a valid result (notice the green square to the left of the check name). Now you can be sure that you table is not empty. ![Row-count check result](https://dqops.com/docs/images/examples/daily-row-count-check-result.png) diff --git a/docs/examples/data-consistency/percent-of-string-in-set.md b/docs/examples/data-consistency/percent-of-string-in-set.md index 888cfca2c9..edaa670864 100644 --- a/docs/examples/data-consistency/percent-of-string-in-set.md +++ b/docs/examples/data-consistency/percent-of-string-in-set.md @@ -1,42 +1,18 @@ -# Percent of string in set +# Percent of rows with string values in set -Verifies that the percentage of strings from a set in a column does not fall below the minimum accepted percentage. +Verifies that the percentage of string values from a set in a column does not fall below the minimum accepted percentage. **PROBLEM** We will be testing [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance) dataset. This data approach student achievement in secondary education of two Portuguese schools. -The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. +The data attributes include student grades, demographic, social and school related features and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. We are verifying if values in the tested column `Fjob` are one of accepted values. -## Data structure - -The following is a fragment of the [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance). -Some columns were omitted for clarity. -The `Fjob` column of interest contains father job's values. - -| Medu | Fedu | Mjob | Fjob | reason | guardian | traveltime | studytime | -|:-----|:-----|:---------|:-------------|:-----------|:---------|:-----------|:----------| -| 4 | 4 | services | **at_home** | course | mother | 1 | 3 | -| 3 | 3 | other | **other** | course | other | 2 | 1 | -| 4 | 3 | teacher | **services** | course | father | 2 | 4 | -| 3 | 2 | health | **services** | home | father | 1 | 2 | -| 4 | 4 | teacher | **teacher** | course | mother | 1 | 1 | -| 3 | 2 | services | **at_home** | home | mother | 1 | 1 | -| 2 | 2 | other | **other** | home | mother | 1 | 2 | -| 1 | 3 | at_home | **services** | home | mother | 1 | 2 | -| 1 | 1 | at_home | **other** | reputation | mother | 1 | 3 | -| 4 | 3 | teacher | **other** | course | mother | 1 | 1 | -| 2 | 1 | other | **other** | course | other | 2 | 3 | -| 2 | 2 | services | **services** | course | father | 1 | 4 | -| 2 | 2 | at_home | **services** | home | mother | 1 | 3 | -| 3 | 3 | services | **services** | home | mother | 1 | 2 | -| 2 | 2 | other | **other** | home | other | 1 | 2 | - **SOLUTION** We will verify the data using profiling [daily_string_value_in_set_percent](../../checks/column/strings/string-value-in-set-percent.md) column check. @@ -54,6 +30,7 @@ SELECT ) / COUNT(*) AS actual_value FROM `dqo-ai-testing`.`kaggle_student_performance`.`maths` AS analyzed_table ``` + In this example, we will set three minimum percent thresholds levels for the check (a minimum accepted percentage of valid rows): - warning: 99 @@ -66,6 +43,31 @@ If you want to learn more about checks and threshold levels, please refer to the If the percent of string from a set values fall below 99, a warning alert will be triggered. +## Data structure + +The following is a fragment of the [Student Performance](https://www.kaggle.com/datasets/whenamancodes/student-performance). +Some columns were omitted for clarity. +The `Fjob` column of interest contains father job's values. + +| Medu | Fedu | Mjob | Fjob | reason | guardian | traveltime | studytime | +|:-----|:-----|:---------|:-------------|:-----------|:---------|:-----------|:----------| +| 4 | 4 | services | **at_home** | course | mother | 1 | 3 | +| 3 | 3 | other | **other** | course | other | 2 | 1 | +| 4 | 3 | teacher | **services** | course | father | 2 | 4 | +| 3 | 2 | health | **services** | home | father | 1 | 2 | +| 4 | 4 | teacher | **teacher** | course | mother | 1 | 1 | +| 3 | 2 | services | **at_home** | home | mother | 1 | 1 | +| 2 | 2 | other | **other** | home | mother | 1 | 2 | +| 1 | 3 | at_home | **services** | home | mother | 1 | 2 | +| 1 | 1 | at_home | **other** | reputation | mother | 1 | 3 | +| 4 | 3 | teacher | **other** | course | mother | 1 | 1 | +| 2 | 1 | other | **other** | course | other | 2 | 3 | +| 2 | 2 | services | **services** | course | father | 1 | 4 | +| 2 | 2 | at_home | **services** | home | mother | 1 | 3 | +| 3 | 3 | services | **services** | home | mother | 1 | 2 | +| 2 | 2 | other | **other** | home | other | 1 | 2 | + + ## Running the checks in the example and evaluating the results using the user interface A detailed explanation of [how to run the example is described here](../../#running-the-use-cases). @@ -108,7 +110,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 40, which is below the minimum threshold level set in the warning (99). - The check gives a fatal result (notice the red square on the left of the name of the check). + The check gives a fatal result (notice the red square to the left of the check name). ![String-in-set-percent check results](https://dqops.com/docs/images/examples/daily-string-in-set-percent-check-results.png) diff --git a/docs/examples/data-reasonability/percentage-of-false-values.md b/docs/examples/data-reasonability/percentage-of-false-values.md index 36ef0c23b3..6acd57a8b6 100644 --- a/docs/examples/data-reasonability/percentage-of-false-values.md +++ b/docs/examples/data-reasonability/percentage-of-false-values.md @@ -1,6 +1,6 @@ # Percentage of false values -This example shows how to detect that the percentage of false values remains above a set threshold. +This example shows how to detect that the percentage of false boolean values remains above a set threshold. **PROBLEM** @@ -15,14 +15,14 @@ a link to a copy of the PDF, which can be found on Google Cloud Storage. The `invalidOcr` column indicates if the OCR does match the raw file text (false value) or does not (true value). In case of the true value, the OCR process needs more work and the file is not ready to be transcribed. -We want to verify the percentage of false values in the `invalidOcr` column, which will tell us what percentage of data is +We want to verify the percentage of false boolean values in the `invalidOcr` column, which will tell us what percentage of data is ready to be transcribed. **SOLUTION** We will verify the data of `bigquery-public-data.fcc_political_ads.content_info` using monitoring [false_percent](../../checks/column/bool/false-percent.md) column check. -Our goal is to verify that the percentage of false values on `invalidOcr` column does not fall below 99%. +Our goal is to verify that the percentage of false boolean values on `invalidOcr` column does not fall below 99%. In this example, we will set three minimum percentage thresholds levels for the check: @@ -96,7 +96,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 99, which is above the minimum threshold level set in the warning (99). - The check gives a valid result (notice the green square on the left of the name of the check). + The check gives a valid result (notice the green square to the left of the check name). ![False-percent check results](https://dqops.com/docs/images/examples/daily-false-percent-check-results.png) diff --git a/docs/examples/data-reasonability/percentage-of-integer-values-in-range.md b/docs/examples/data-reasonability/percentage-of-integer-values-in-range.md index 879155645b..9fbb519636 100644 --- a/docs/examples/data-reasonability/percentage-of-integer-values-in-range.md +++ b/docs/examples/data-reasonability/percentage-of-integer-values-in-range.md @@ -29,7 +29,7 @@ If you want to learn more about checks and threshold levels, please refer to the **VALUE** -If the percentage of valid values falls below 5.0%, a warning alert will be triggered. +If the percentage of valid values falls below 5.0%, an error alert will be triggered. ## Data structure @@ -90,8 +90,8 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. - The actual value in this example is 92, which is below the minimum threshold level set in the warning (99.0%). - The check gives a warning (notice the orange square on the left of the name of the check). + The actual value in this example is 92, which is below the minimum threshold level set in the error (95.0%). + The check raises an error issue (notice the orange square to the left of the check name). ![Values-in-range-numeric-percent check results](https://dqops.com/docs/images/examples/daily-values-in-range-numeric-percent-checks-results.png) diff --git a/docs/examples/data-reasonability/string-not-exceeding-a-set-length.md b/docs/examples/data-reasonability/string-not-exceeding-a-set-length.md index 198b6e58a9..62cf274d91 100644 --- a/docs/examples/data-reasonability/string-not-exceeding-a-set-length.md +++ b/docs/examples/data-reasonability/string-not-exceeding-a-set-length.md @@ -1,6 +1,6 @@ # A string not exceeding a set length -The check verifies that the length of the string does not exceed the indicated value. +This example shows how to verify that the maximal length of the string in a column does not exceed the set length. **PROBLEM** @@ -10,23 +10,23 @@ The platform analyzes more than 340 measures of behaviors, social and economic f Data is based on public-use data sets, such as the U.S. Census and the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS), the world’s largest, annual population-based telephone survey of over 400,000 people. -The `measure_name` contains measure name data. We want to verify that the length of the string values in this column does not exceed 30. +The `measure_name` contains measure name data. We want to verify that the length of the string values in this column does not exceed 30 characters. **SOLUTION** We will verify the data of `bigquery-public-data.america_health_rankings.ahr` using monitoring [string_max_length](../../checks/column/strings/string-max-length.md) column check. -Our goal is to verify if the number of valid length values on `measure_name` column does not exceed the setup thresholds. +Our goal is to verify if the length of the strings in `measure_name` column does not exceed the set threshold. In this example, we will set one maximum thresholds level for the check: -- warning: 30.0 +- error: 30.0 If you want to learn more about checks and threshold levels, please refer to the [DQOps concept section](../../dqo-concepts/checks/index.md). **VALUE** -If the string length exceed 30.0, a warning alert will be triggered. +If the string length exceed 30.0, en error alert will be triggered. ## Data structure @@ -86,8 +86,8 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. - The actual value in this example is 31, which is above the maximum threshold level set in the error (30). - The check gives an error(notice the orange square on the left of the name of the check). + The actual value in this example is 31, which is above the maximum threshold level set in the error field (30). + The check result in an error issue (notice the orange square to the left of the check name). ![String-max-length check results](https://dqops.com/docs/images/examples/daily-string-max-length-checks-results.png) diff --git a/docs/examples/data-uniqueness/percentage-of-duplicates.md b/docs/examples/data-uniqueness/percentage-of-duplicates.md index f0dec24040..2e40439796 100644 --- a/docs/examples/data-uniqueness/percentage-of-duplicates.md +++ b/docs/examples/data-uniqueness/percentage-of-duplicates.md @@ -87,7 +87,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 0, which is below the maximum threshold level set in the warning (1.0%). - The check gives a valid result (notice the green square on the left of the name of the check). + The check gives a valid result (notice the green square to the left of the check name). ![Duplicate-percent check results](https://dqops.com/docs/images/examples/daily-duplicate-percent-checks-results.png) diff --git a/docs/examples/data-validity/number-of-invalid-IP4-address.md b/docs/examples/data-validity/number-of-invalid-IP4-address.md index ea6760e820..d00cff50ed 100644 --- a/docs/examples/data-validity/number-of-invalid-IP4-address.md +++ b/docs/examples/data-validity/number-of-invalid-IP4-address.md @@ -93,7 +93,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 5, which is above the maximum threshold level set in the warning (0). - The check gives a warning (notice the yellow square on the left of the name of the check). + The check gives a warning (notice the yellow square to the left of the check name). ![String-invalid-ip4-address-count check results](https://dqops.com/docs/images/examples/daily-string-invalid-ip4-address-count-checks-results.png) diff --git a/docs/examples/data-validity/number-of-invalid-emails.md b/docs/examples/data-validity/number-of-invalid-emails.md index 507f57cebb..43a8844b13 100644 --- a/docs/examples/data-validity/number-of-invalid-emails.md +++ b/docs/examples/data-validity/number-of-invalid-emails.md @@ -84,7 +84,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 22, which is above the maximum threshold level set in the warning (0). - The check gives a fatal error (notice the red square on the left of the name of the check). + The check gives a fatal error (notice the red square to the left of the check name). ![String-invalid-email-count check results](https://dqops.com/docs/images/examples/daily-string-invalid-email-count-check-results.png) diff --git a/docs/examples/data-validity/percentage-of-negative-values.md b/docs/examples/data-validity/percentage-of-negative-values.md index 57fe189bbf..f9a3e14d9a 100644 --- a/docs/examples/data-validity/percentage-of-negative-values.md +++ b/docs/examples/data-validity/percentage-of-negative-values.md @@ -92,7 +92,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 48, which is above the maximum threshold level set in the warning (45.0%). - The check gives a warning result (notice the yellow square on the left of the name of the check). + The check gives a warning result (notice the yellow square to the left of the check name). ![Negative-percent check results](https://dqops.com/docs/images/examples/daily-negative-percent-checks-results.png) diff --git a/docs/examples/data-validity/percentage-of-rows-passing-sql-condition.md b/docs/examples/data-validity/percentage-of-rows-passing-sql-condition.md index 244734528a..299a875c4a 100644 --- a/docs/examples/data-validity/percentage-of-rows-passing-sql-condition.md +++ b/docs/examples/data-validity/percentage-of-rows-passing-sql-condition.md @@ -107,7 +107,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 92, which is below the minimum threshold level set in the warning (100.0%). - The check gives a fatal error (notice the red square on the left of the name of the check). + The check gives a fatal error (notice the red square to the left of the check name). ![SQL-condition-passed-percent check results](https://dqops.com/docs/images/examples/daily-sql-condition-passed-percent-on-table-checks-results.png) diff --git a/docs/examples/data-validity/percentage-of-strings-matching-date-regex.md b/docs/examples/data-validity/percentage-of-strings-matching-date-regex.md index cf33b5a976..85ed4fadee 100644 --- a/docs/examples/data-validity/percentage-of-strings-matching-date-regex.md +++ b/docs/examples/data-validity/percentage-of-strings-matching-date-regex.md @@ -91,7 +91,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 0, which is below the minimum threshold level set in the warning (99.0%). - The check gives a fatal error (notice the red square on the left of the name of the check). + The check gives a fatal error (notice the red square to the left of the check name). ![String-match-date-regex-percent check results](https://dqops.com/docs/images/examples/daily-string-match-date-regex-percent-checks-results.png) diff --git a/docs/examples/data-validity/percentage-of-valid-currency-codes.md b/docs/examples/data-validity/percentage-of-valid-currency-codes.md index c6f3c1186b..05fd4cd758 100644 --- a/docs/examples/data-validity/percentage-of-valid-currency-codes.md +++ b/docs/examples/data-validity/percentage-of-valid-currency-codes.md @@ -99,7 +99,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 64, which is below the minimum threshold level set in the warning (75.0%). - The check gives an error result (notice the orange square on the left of the name of the check). + The check gives an error result (notice the orange square to the left of the check name). ![String-valid-currency-code-percent check results](https://dqops.com/docs/images/examples/daily-string-valid-currency-code-percent-checks-results.png) diff --git a/docs/examples/data-validity/percentage-of-valid-latitude-and-longitude.md b/docs/examples/data-validity/percentage-of-valid-latitude-and-longitude.md index 98246c6ff4..1ce1492791 100644 --- a/docs/examples/data-validity/percentage-of-valid-latitude-and-longitude.md +++ b/docs/examples/data-validity/percentage-of-valid-latitude-and-longitude.md @@ -94,7 +94,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 99, which is above the minimum threshold level set in the warning (99.0%). - The check gives a valid result (notice the green square on the left of the name of the check). + The check gives a valid result (notice the green square to the left of the check name). ![Valid-latitude-and-longitude-percent check results](https://dqops.com/docs/images/examples/daily-valid-latitude-and-longitude-percent-checks-results.png) diff --git a/docs/examples/data-validity/percentage-of-valid-uuid.md b/docs/examples/data-validity/percentage-of-valid-uuid.md index cc0e3d58ca..e6de2436bf 100644 --- a/docs/examples/data-validity/percentage-of-valid-uuid.md +++ b/docs/examples/data-validity/percentage-of-valid-uuid.md @@ -94,7 +94,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 75, which is below the minimum threshold level set in the warning (100.0%). - The check gives a fatal error (notice the red square on the left of the name of the check). + The check gives a fatal error (notice the red square to the left of the check name). ![String-valid-uuid-percent check results](https://dqops.com/docs/images/examples/daily-string-valid-uuid-percent-checks-result.png) diff --git a/docs/examples/data-validity/percentage-of-values-that-contains-usa-zipcode.md b/docs/examples/data-validity/percentage-of-values-that-contains-usa-zipcode.md index 0a1389c4b7..197a130841 100644 --- a/docs/examples/data-validity/percentage-of-values-that-contains-usa-zipcode.md +++ b/docs/examples/data-validity/percentage-of-values-that-contains-usa-zipcode.md @@ -88,7 +88,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 98, which is above the maximum threshold level set in the warning (10.0%). - The check gives a fatal error (notice the red square on the left of the name of the check). + The check gives a fatal error (notice the red square to the left of the check name). ![Contains-usa-zipcode-percent check results](https://dqops.com/docs/images/examples/daily-contains-usa-zipcode-percent-checks-results.png) diff --git a/docs/examples/stability/table-availability.md b/docs/examples/stability/table-availability.md index ad2fb4e1c4..4efc54a906 100644 --- a/docs/examples/stability/table-availability.md +++ b/docs/examples/stability/table-availability.md @@ -1,6 +1,6 @@ # Table availability -Verifies the availability of a table in the database using a simple row count. +THe following examples shows how to verify the availability of a table in the database using a simple row count. **PROBLEM** @@ -87,7 +87,7 @@ To execute the check prepared in the example using the [user interface](../../dq Review the results which should be similar to the one below. The actual value in this example is 1. - The check gives a warning result (notice the yellow square on the left of the name of the check). + The check gives a warning result (notice the yellow square to the left of the check name). ![Table-availability check results](https://dqops.com/docs/images/examples/daily-table-availability-checks-results.png)