Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: derive_var_dthcaus does not work as expected when DEATH stored in AE and DS #2154

Closed
millerg23 opened this issue Oct 5, 2023 · 5 comments · Fixed by #2162
Closed

Bug: derive_var_dthcaus does not work as expected when DEATH stored in AE and DS #2154

millerg23 opened this issue Oct 5, 2023 · 5 comments · Fixed by #2162
Assignees
Labels
bug Something isn't working hotfix Issue to be released in the next patch release programming

Comments

@millerg23
Copy link
Collaborator

millerg23 commented Oct 5, 2023

What happened?

User reported they had DEATH recorded both in AE and DS with same date. admiral was selecting DS, but according to documentation the AE source was passed in first so should have been selected.
I reviewed the unit tests for this function, and it is not well tested for this, no tests had death recorded in both AE and DS and AE death was earliest, or dates matched.

Session Information

No response

Reproducible Example

Test code as follows:

adsl <- tibble::tribble(
  ~STUDYID, ~USUBJID,
  "TEST01", "PAT01",
  "TEST01", "PAT02",
  "TEST01", "PAT03"
)

ae <- tibble::tribble(
  ~STUDYID, ~USUBJID, ~AESEQ, ~AEDECOD, ~AEOUT, ~AEDTHDTC,
  "TEST01", "PAT01", 12, "SUDDEN DEATH", "FATAL", "2021-04-04"
) %>%
  mutate(
    AEDTHDT = ymd(AEDTHDTC)
  )

ds <- tibble::tribble(
  ~STUDYID, ~USUBJID, ~DSSEQ, ~DSDECOD, ~DSTERM, ~DSSTDTC,
  "TEST01", "PAT01", 4, "DEATH", "DEATH DUE TO progression of disease", "2021-04-05",
  "TEST01", "PAT02", 1, "INFORMED CONSENT OBTAINED", "INFORMED CONSENT OBTAINED", "2021-04-02",
  "TEST01", "PAT02", 2, "RANDOMIZATION", "RANDOMIZATION", "2021-04-11",
  "TEST01", "PAT02", 3, "COMPLETED", "PROTOCOL COMPLETED", "2021-12-01",
  "TEST01", "PAT03", 1, "DEATH", "DEATH DUE TO progression of disease", "2021-04-07",
  "TEST01", "PAT03", 2, "RANDOMIZATION", "RANDOMIZATION", "2021-04-11",
  "TEST01", "PAT03", 3, "COMPLETED", "PROTOCOL COMPLETED", "2021-12-01"
)

# Derive `DTHCAUS` only - for on-study deaths only
src_ae <- dthcaus_source(
   dataset_name = "ae",
   filter = AEOUT == "FATAL",
   date = convert_dtc_to_dt(AEDTHDTC),
   mode = "first",
   dthcaus = AEDECOD
 )

src_ds <- dthcaus_source(
   dataset_name = "ds",
   filter = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM),
   date = convert_dtc_to_dt(DSSTDTC),
   mode = "first",
   dthcaus = DSTERM
 )
chk1 <- adsl %>% 
  derive_var_dthcaus(src_ae, src_ds, source_datasets = list(ae = ae, ds = ds))

For PAT01 the DS record is erroneously selected.

When reviewing function on line 187 -188 we have the following:

tmp_source_nr <- get_new_tmp_var(dataset)
tmp_date <- get_new_tmp_var(dataset)

This is inside the loop

for (ii in seq_along(sources)) {

When the lines are moved to just before the loop, the code works as expected:

tmp_source_nr <- get_new_tmp_var(dataset)
tmp_date <- get_new_tmp_var(dataset)
for (ii in seq_along(sources)) {

We don't want a new temp var created for each loop, as then only the last source object has anything populated for the last temporary vars. This is why if DEATH in both AE and DS, only the last source object passed in is used for a subject, regardless of dates.

@millerg23
Copy link
Collaborator Author

Also, I think if info about death coming from both AE and DS, we should look at AE, regardless of dates.

@rossfarrugia
Copy link
Collaborator

agree @millerg23 AE always has more granular info of the cause of death than DS

@millerg23
Copy link
Collaborator Author

I wonder if we could have more control over the options, I think always ordering within domain makes sense, but maybe we can have an option of selecting first based on domain (order of source list), and also have option to take earliest across domains, it should then cater for most scenarios?
With current implementation we order:

filter_extreme(
      order = exprs(!!tmp_date, !!tmp_source_nr),
      by_vars = subject_keys,
      mode = "first"
    )

The new option wouls be to swtich these to order by source list then dates:

filter_extreme(
      order = exprs( !!tmp_source_nr, !!tmp_date),
      by_vars = subject_keys,
      mode = "first"
    )

@bms63 bms63 removed their assignment Oct 6, 2023
@bms63
Copy link
Collaborator

bms63 commented Oct 6, 2023

Anyone want to go bug hunting? @pharmaverse/admiral

@bundfussr
Copy link
Collaborator

I wonder if we could have more control over the options, I think always ordering within domain makes sense, but maybe we can have an option of selecting first based on domain (order of source list), and also have option to take earliest across domains, it should then cater for most scenarios?

I agree that we should implement this option but I would not implement it in derive_var_dthcaus() but in derive_vars_extreme_event() (#2138). It should be implemented such that the behavior is consistent with derive_extreme_event() (#2140).

@bundfussr bundfussr added the hotfix Issue to be released in the next patch release label Oct 10, 2023
@bundfussr bundfussr self-assigned this Oct 10, 2023
@bundfussr bundfussr moved this from Priority to In Progress in admiral (sdtm/adam, dev, ci, template, core) Oct 10, 2023
@bundfussr bundfussr linked a pull request Oct 10, 2023 that will close this issue
14 tasks
bundfussr added a commit that referenced this issue Oct 11, 2023
* #2154 fix_dthcaus: fix derive_var_dthcaus()

* #2154 fix_dthcaus: run templates check

* #2154 fix_dthcaus: update version number and NEWS
@bms63 bms63 closed this as completed in 9bdd50c Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hotfix Issue to be released in the next patch release programming
Development

Successfully merging a pull request may close this issue.

4 participants