-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine Atlas Cohorts to a new cohort #398
Comments
@chrisknoll For combining cohorts in Atlas, can we do something like this in new Circe-be? Take a fully qualified cohort and it becomes an input for a new cohort. i.e. we add a new criteriaQueries that references another cohort-definition or cohort itself. criteriaQueries have the following fields: person_id, event_id, start_date, end_date, target_concept_id
Can we use another cohort-definition or cohort as a criteria for initial event cohort? |
Hi, @gowthamrao, But with the new CIRCE repo and the refactoring, we have the pieces of a cohort expression that we could arrange into a new form to support the functionality you are proposing....something like a 'compound cohort expression' which contains 1..n cohort expressions and a datastructure that describes how the cohorts should interact (merge, intersect, excude, etc). The SQL generation in the 'compound' case would be to materialize each individual cohort into a temp table, and then do the final merge/intersect to produce a final cohort result. Just brainstorming here, I don't have any actual plans for that sort of implementation, but it would be an interesting community contribution... |
Part of the cohort criteria is the correlated query where you can say 'at least N occurrences of...'. You can also say 'at least N distinct occurrences of...'. This is where the 'target_concept_id' comes into play. For conditions, it would use the condition_concept_id field...for drug_exposure, it will be drug_concept_id, etc. It's simply used to indicate how to count distinct correlated events. |
I see what you are saying. If we reference a cohort_definition_id, since cohort_definition_id is specific to a local dbms environment (it depends on the sequence function of dbms) - the cohort-constructor expression is no longer 'encapsulated'. i.e. if we migrate from one environment to another by copying the cohort constructor JSON specifications, the inclusion of a local cohort_definition_id in the cohort-constructor JSON creates an identity problem as the new dbms may assign a different cohort_definiton_id for the same cohort being referenced or may not have the referenced cohort. We handle similar situations in two others places in Atlas (that I can think of):
So, we have two options -- This makes the solution simple -- UI in Altas could be a new section (we need a new name for it other than cohorts - maybe new -> 'local cohorts' vs current -> 'global cohorts'?) -- and we could clone the reporting and explore portion of the cohorts-section. The new 'local' cohort will also be available for population-level estimation, plp, FeatureExtraction, incidence rate etc. Thoughts? |
That's not a risk, that's by design. If you are working with a cohort definition and producing evidence for patient care, you do not want someone to change one of the dependent concept sets externally from the cohort definition and alter your study design. When putting together the design for the cohort builder, we had to balance convenience with 'gotchas', and the added complexity of maintaining the cohorts independently of the concept sets that might have been imported was the cost of keeping the cohort definition consistent.
What you're picking up on was where we drew the line at encapsulation vs. convenience. We recognized that making a copy of the cohort expression inside the FE/IR/PLP analysis would add complexity that we didn't want to manage. So the code that generates the analysis SQL just takes a set of cohort definition IDs that should be used in the analysis, we don't care where they came from. This also de-couples the origin of the cohort records: they could have been generated by anything (CIRCE just being one way to generate cohorts). A note on the Atlas UX: So this is a lot of text, but just wanted to give you the background context, without pushing you one way or another. I think if you have an idea for enhancing the Atlas UI by introducing an alternative form of defining cohorts, if you have the time then go for it! I can tell you about the decisions that were made that might help avoid some of the pitfalls. I would also say that I'd like to avoid rewriting CIRCE when I think it's possible to just use CIRCE as building blocks to construct a more complex feature. If some additional refactoring to CIRCE is required to make some of those blocks more accessible, that's fine, but I'm talking about more invasive changes like the expression now has to be able to self-identify (expressions don't have any information about their cohort ID, they just know how to query the records from a CDM). That's why I think building a new construct for 'compound cohort expression' that leverages the parts of CIRCE that query the DB would work. It doesn't solve any of the issues of maintainability that you raise above, that would involve just rethinking the 'raw' vs. 'templated' approach for the UX. |
@chrisknoll @anthonysena with the new hydra functionality - is the cohort definitions exported as JSON into R? Circling back on this topic - i think it would be a cool functionality to create a new cohort based off of other cohorts in the same data source. In addition to creating such a cohort, making that cohort available for other analytic functions of Atlas like incident rate, plp, ple would be great too. e.g. for large scale analytics, where we can build a library of 100s of condition cohorts, and 100s of procedure cohorts, and 100s of payer contract cohorts -- now we have the ability, using interset/union/difference, be able to analyze 100100100 cohorts without the need for building and maintaining 1,000,000 cohorts! |
@gowthamrao - with the work done in #953, the cohort definitions that are part of the PLP/PLE specifications are exported as JSON and kept in the exported R Package. |
It would great to have a feature where cohorts maybe combined to create a new cohort.
Offer the following functionality
Approach
The text was updated successfully, but these errors were encountered: