Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnose matrices not aligned with data #77

Closed
rmnldwg opened this issue Mar 5, 2024 · 0 comments
Closed

diagnose matrices not aligned with data #77

rmnldwg opened this issue Mar 5, 2024 · 0 comments
Assignees
Labels
1.0 release Issues to fix before 1.0 release code quality Improvements w.r.t. readability of code & best practices of coding feature New feature or request
Milestone

Comments

@rmnldwg
Copy link
Owner

rmnldwg commented Mar 5, 2024

Because the diagnose matrices are computed separately for each T-stage, they are not aligned with the patient data stored in the model anymore.

A solution could be to store the diagnose matrices in the patient data DataFrame and filter that by T-stage when model.diagnose_matrices[t_stage] is called.

This has benefits for both the Bayesian network implementation and the mixture model. And if I didn't overlook anything, this should be possible without breaking changes.

@rmnldwg rmnldwg added feature New feature or request code quality Improvements w.r.t. readability of code & best practices of coding 1.0 release Issues to fix before 1.0 release labels Mar 5, 2024
@rmnldwg rmnldwg added this to the version 1.0.0 milestone Mar 5, 2024
@rmnldwg rmnldwg self-assigned this Mar 5, 2024
rmnldwg added a commit that referenced this issue Mar 5, 2024
The last change caused a dramatic slowdown (factor 500) of the data and
diagnose matrix access, because it needed to index them from a
`DataFrame`. Now, I implemented a basic caching scheme with a patient
data cache version that brought back the original speed.

Related: #77
rmnldwg added a commit that referenced this issue Mar 5, 2024
The data and diagnose matrix is now computed on demand in the getter for
the `patient_data` property. This makes sure that the user always sees
an up to date version of the data encoding & the diagnose probabilities.

Also, apparently `del dataframe[column]` is much slower than
`dataframe.drop(columns)`. I replaced the former with the latter and now
the tests are fast again.

Related: #77
rmnldwg added a commit that referenced this issue Mar 5, 2024
Since we now have access to the full diagnose matrix by default, there
is no need for the Bayesian network T-stage fix anymore.

Related: #77
@rmnldwg rmnldwg closed this as completed in dcd745f Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 release Issues to fix before 1.0 release code quality Improvements w.r.t. readability of code & best practices of coding feature New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant