From 035a24a363b3f4dccbf20b924985829e5e9e54b8 Mon Sep 17 00:00:00 2001 From: Maxim Moinat Date: Fri, 10 May 2024 13:51:59 +0200 Subject: [PATCH 1/3] add startBeforeEnd documentation --- vignettes/checks/plausibleAfterBirth.Rmd | 2 +- vignettes/checks/plausibleStartBeforeEnd.Rmd | 37 ++++++++++++++------ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/vignettes/checks/plausibleAfterBirth.Rmd b/vignettes/checks/plausibleAfterBirth.Rmd index eb399cb1..57ba0ced 100644 --- a/vignettes/checks/plausibleAfterBirth.Rmd +++ b/vignettes/checks/plausibleAfterBirth.Rmd @@ -20,7 +20,7 @@ output: The number and percent of records with a date value in the **cdmFieldName** field of the **cdmTableName** table that occurs prior to birth. ## Definition -This check verifies that events happen after birth. This check is only run on fields where the **PLAUSIBLE_AFTER_BIRTH** parameter is set to **Yes**. The birthdate is taken from the `person` table, either the `birth_datetime` or composed from `year_of_birth`, `month_of_birth`, `day_of_birth` (taking 1st month/1st day if missing). +This check verifies that events happen after birth. The birthdate is taken from the `person` table, either the `birth_datetime` or composed from `year_of_birth`, `month_of_birth`, `day_of_birth` (taking 1st month/1st day if missing). - *Numerator*: The number of records with a non-null date value that happen prior to birth - *Denominator*: The total number of records in the table with a non-null date value diff --git a/vignettes/checks/plausibleStartBeforeEnd.Rmd b/vignettes/checks/plausibleStartBeforeEnd.Rmd index 2d99c696..0607da43 100644 --- a/vignettes/checks/plausibleStartBeforeEnd.Rmd +++ b/vignettes/checks/plausibleStartBeforeEnd.Rmd @@ -14,33 +14,48 @@ output: **Context**: Verification\ **Category**: Plausibility\ **Subcategory**: Temporal\ -**Severity**: +**Severity**: CDM convention ⚠\ ## Description -The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName that occurs after the date in the @plausibleStartBeforeEndFieldName. +The number and percent of records with a value in the **cdmFieldName** field of the **cdmTableName** that occurs after the date in the **plausibleStartBeforeEndFieldName**. ## Definition +Most tables have a field for the start and a field for the end date for the event. This check verifies that the start date is not after the end date. The start date can be before the end date or equal to the end date. It is applied to the start date field and takes the end date field as a parameter. Both date and datetime fields are checked. -- *Numerator*: -- *Denominator*: -- *Related CDM Convention(s)*: -- *CDM Fields/Tables*: -- *Default Threshold Value*: +- *Numerator*: The number of records where date in **cdmFieldName** is after the date in **plausibleStartBeforeEndFieldName**. +- *Denominator*: The total number of records with a non-null start and non-null end date value +- *Related CDM Convention(s)*: -Not linked to a convention- +- *CDM Fields/Tables*: This check runs on all date and datetime fields that have a start and end date in the same table. It also runs on the `cdm_source` table, comparing `source_release_date` is before `cdm_release_date`. +- *Default Threshold Value*: 1% (except for vocabulary and cdm_source tables, where it is 0%) ## User Guidance - +If the start date is after the end date, it is likely that the data is incorrect or the dates are unreliable. ### Violated rows query ```sql - +SELECT + '@cdmTableName.@cdmFieldName' AS violating_field, + cdmTable.* +FROM @schema.@cdmTableName cdmTable +WHERE cdmTable.@cdmFieldName IS NOT NULL +AND cdmTable.@plausibleStartBeforeEndFieldName IS NOT NULL +AND cdmTable.@cdmFieldName > cdmTable.@plausibleStartBeforeEndFieldName ``` - ### ETL Developers +There main reason for this check to fail is often that the source data is incorrect. If the end date is derived from other data, the calculation might not take into account some edge cases. +Any violating checks should either be removed or corrected. In most cases this can be done by adjusting the end date: +- With a few exceptions, the end date is not mandatory and can be left empty. +- If the end date is mandatory (visit_occurrence and drug_exposure), the end date can be set to the start date if the event. Note tha +- If this check fails for the observation_period, it might signify a bigger underlying issue. Please investigate all records for this person in the CDM and source. +- If neither the start or end date can be trusted, pleaes remove the record from the CDM. -### Data Users +Make sure to clearly document the choices in your ETL specification. +### Data Users +An start date after the end date gives negative event durations, which might break analyses. +Especially take note if this check fails for the `observation_period` table. This means that there are persons with negative observation time. If these persons are included in a cohort, it will potentially skew e.g. survival analyses. From 5384e12e4cf48b55ba6d18a6f8eaf96ace93f545 Mon Sep 17 00:00:00 2001 From: Maxim Moinat Date: Fri, 10 May 2024 16:32:41 +0200 Subject: [PATCH 2/3] plausibleBeforeDeath docs --- vignettes/checkIndex.Rmd | 4 +- vignettes/checks/plausibleAfterBirth.Rmd | 1 + vignettes/checks/plausibleBeforeDeath.Rmd | 39 ++++++++++++++------ vignettes/checks/plausibleStartBeforeEnd.Rmd | 6 ++- 4 files changed, 34 insertions(+), 16 deletions(-) diff --git a/vignettes/checkIndex.Rmd b/vignettes/checkIndex.Rmd index e4fc563d..326f1812 100644 --- a/vignettes/checkIndex.Rmd +++ b/vignettes/checkIndex.Rmd @@ -47,7 +47,7 @@ above to navigate to the check's documentation page.\ - plausibleDuringLife (PAGE UNDER CONSTRUCTION) - withinVisitDates (PAGE UNDER CONSTRUCTION) - [plausibleAfterBirth](checks/plausibleAfterBirth.html) -- plausibleBeforeDeath (PAGE UNDER CONSTRUCTION) -- plausibleStartBeforeEnd (PAGE UNDER CONSTRUCTION) +- [plausibleBeforeDeath](checks/plausibleBeforeDeath.html) +- [plausibleStartBeforeEnd](checks/plausibleStartBeforeEnd.html) - plausibleGender (PAGE UNDER CONSTRUCTION) - plausibleUnitConceptIds (PAGE UNDER CONSTRUCTION) diff --git a/vignettes/checks/plausibleAfterBirth.Rmd b/vignettes/checks/plausibleAfterBirth.Rmd index 57ba0ced..6e0eeced 100644 --- a/vignettes/checks/plausibleAfterBirth.Rmd +++ b/vignettes/checks/plausibleAfterBirth.Rmd @@ -18,6 +18,7 @@ output: ## Description The number and percent of records with a date value in the **cdmFieldName** field of the **cdmTableName** table that occurs prior to birth. +Note that this check replaces the previous `plausibleTemporalAfter` check. ## Definition This check verifies that events happen after birth. The birthdate is taken from the `person` table, either the `birth_datetime` or composed from `year_of_birth`, `month_of_birth`, `day_of_birth` (taking 1st month/1st day if missing). diff --git a/vignettes/checks/plausibleBeforeDeath.Rmd b/vignettes/checks/plausibleBeforeDeath.Rmd index 91adf4c4..dfd20699 100644 --- a/vignettes/checks/plausibleBeforeDeath.Rmd +++ b/vignettes/checks/plausibleBeforeDeath.Rmd @@ -1,6 +1,6 @@ --- title: "plausibleBeforeDeath" -author: "" +author: "Maxim Moinat" date: "`r Sys.Date()`" output: html_document: @@ -14,33 +14,48 @@ output: **Context**: Verification\ **Category**: Plausibility\ **Subcategory**: Temporal\ -**Severity**: +**Severity**: Characterization ✔ ## Description -The number and percent of records with a date value in the @cdmFieldName field of the @cdmTableName table that occurs after death. +The number and percent of records with a date value in the **cdmFieldName** field of the **cdmTableName** table that occurs more than 60 days after death. +Note that this check replaces the previous `plausibleDuringLife` check. ## Definition +A record violates this check if the date is more than 60 days after the death date of the person, allowing administrative records directly after death. -- *Numerator*: -- *Denominator*: -- *Related CDM Convention(s)*: -- *CDM Fields/Tables*: -- *Default Threshold Value*: +- *Numerator*: The number of records where date in **cdmFieldName** is more than 60 days after the persons' death date. +- *Denominator*: Total number of records of persons with a death date, in the **cdmTableName**. +- *Related CDM Convention(s)*: -Not linked to a convention- +- *CDM Fields/Tables*: This check runs on all date and datetime fields. +- *Default Threshold Value*: 1% ## User Guidance - +Events are expected to occur between birth and death. The check `plausibleAfterbirth` checks for the former, this check for the latter. +The 60-day period is a conservative estimate of the time it takes for administrative records to be updated after a person's death. +By default, both start and end dates are checked. ### Violated rows query ```sql - +SELECT + '@cdmTableName.@cdmFieldName' AS violating_field, + cdmTable.* +FROM @cdmDatabaseSchema.@cdmTableName cdmTable +JOIN @cdmDatabaseSchema.death de + ON cdmTable.person_id = de.person_id +WHERE cdmTable.@cdmFieldName IS NOT NULL + AND CAST(cdmTable.@cdmFieldName AS DATE) > DATEADD(day, 60, de.death_date) ``` - ### ETL Developers +Start dates after death are likely to be source data issues, and failing this check should trigger investigation of the source data quality. +End dates after death can occur due to derivation logic. For example, a drug exposure can be prescribed as being continued long after death. +In such cases, it is recommended to update the logic to end the prescription at death. ### Data Users - +For most studies, a low number of violating records will have limited impact on data use as it could be caused by lagging administrative records. +However, it might signify a larger data quality issue. +Note that the percentage violating records reported is among records from death persons and such might be slightly inflated if comparing to the overall population. diff --git a/vignettes/checks/plausibleStartBeforeEnd.Rmd b/vignettes/checks/plausibleStartBeforeEnd.Rmd index 0607da43..d65bcfe3 100644 --- a/vignettes/checks/plausibleStartBeforeEnd.Rmd +++ b/vignettes/checks/plausibleStartBeforeEnd.Rmd @@ -1,6 +1,6 @@ --- title: "plausibleStartBeforeEnd" -author: "" +author: "Maxim Moinat" date: "`r Sys.Date()`" output: html_document: @@ -19,10 +19,12 @@ output: ## Description The number and percent of records with a value in the **cdmFieldName** field of the **cdmTableName** that occurs after the date in the **plausibleStartBeforeEndFieldName**. +Note that this check replaces the previous `plausibleTemporalAfter` check. ## Definition -Most tables have a field for the start and a field for the end date for the event. This check verifies that the start date is not after the end date. The start date can be before the end date or equal to the end date. It is applied to the start date field and takes the end date field as a parameter. Both date and datetime fields are checked. +This check is attempting to apply temporal rules within a table, specifically checking that all start dates are before the end dates. For example, in the VISIT_OCCURRENCE table it checks that the VISIT_OCCURRENCE_START_DATE is before VISIT_OCCURRENCE_END_DATE. +The start date can be before the end date or equal to the end date. It is applied to the start date field and takes the end date field as a parameter. Both date and datetime fields are checked. - *Numerator*: The number of records where date in **cdmFieldName** is after the date in **plausibleStartBeforeEndFieldName**. - *Denominator*: The total number of records with a non-null start and non-null end date value From 8efebd445a4229ef096b599bfd424bfb6fb5f9fa Mon Sep 17 00:00:00 2001 From: Maxim Moinat Date: Mon, 13 May 2024 14:17:53 +0200 Subject: [PATCH 3/3] update plausibleStartBeforeEnd documentation --- vignettes/checks/plausibleStartBeforeEnd.Rmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/vignettes/checks/plausibleStartBeforeEnd.Rmd b/vignettes/checks/plausibleStartBeforeEnd.Rmd index d65bcfe3..4095a7ae 100644 --- a/vignettes/checks/plausibleStartBeforeEnd.Rmd +++ b/vignettes/checks/plausibleStartBeforeEnd.Rmd @@ -29,8 +29,10 @@ The start date can be before the end date or equal to the end date. It is applie - *Numerator*: The number of records where date in **cdmFieldName** is after the date in **plausibleStartBeforeEndFieldName**. - *Denominator*: The total number of records with a non-null start and non-null end date value - *Related CDM Convention(s)*: -Not linked to a convention- -- *CDM Fields/Tables*: This check runs on all date and datetime fields that have a start and end date in the same table. It also runs on the `cdm_source` table, comparing `source_release_date` is before `cdm_release_date`. -- *Default Threshold Value*: 1% (except for vocabulary and cdm_source tables, where it is 0%) +- *CDM Fields/Tables*: This check runs on all start date/datetime fields with an end date/datetime in the same table. It also runs on the cdm_source table, comparing `source_release_date` is before `cdm_release_date`. +- *Default Threshold Value*: + - 0% for the observation_period, vocabulary (valid_start/end_date) and cdm_source tables. + - 1% for other tables with an end date. ## User Guidance @@ -52,7 +54,7 @@ There main reason for this check to fail is often that the source data is incorr Any violating checks should either be removed or corrected. In most cases this can be done by adjusting the end date: - With a few exceptions, the end date is not mandatory and can be left empty. -- If the end date is mandatory (visit_occurrence and drug_exposure), the end date can be set to the start date if the event. Note tha +- If the end date is mandatory (notably visit_occurrence and drug_exposure), the end date can be set to the start date if the event. Make sure to document this as it leads to loss of duration information. - If this check fails for the observation_period, it might signify a bigger underlying issue. Please investigate all records for this person in the CDM and source. - If neither the start or end date can be trusted, pleaes remove the record from the CDM.