Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions of icd-8 and icd-10 codes in lpr_diag and d_inddto in lpr_adm #180

Open
Aastedet opened this issue Dec 20, 2024 · 0 comments

Comments

@Aastedet
Copy link
Collaborator

Aastedet commented Dec 20, 2024

As commented in #124 it seems that when create_fake_icd() is called inside map() it doesn't resample the date, which leads to it only generating either ICD-10 codes or ICD-8 codes. Currently, it only generates ICD-10 codes, which is fine for development.

It's hard to see how we can generate two tables (lpr_adm and lpr_diag) independently where diagnosis codes in one are coherent with dates in the other. The current implementation of create_fake_icd() is able to take a date of the record and adjust the icd-version to correspond to the real data. However, since the date variable d_inddto is generated in lpr_adm, and the diagnosis code c_diag is generated in lpr_diag, the date information isn't available inside lpr_diag to aid diagnosis code generation.

I think the easiest solution is to just use a random sample of dates in lpr_adm and a random sample of icd8 and icd10 codes in lpr_diag and accept that some icd-8 codes in lpr_diag will be joined to dates after 1994 in lpr_adm where the icd-8 had been phased out in the real data.
Alternatively, we would have to join the two tables after generation, resample dates or diagnosis codes, and then split the merged table before saving the two components in register_data

@Aastedet Aastedet converted this from a draft issue Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant