Cohort def optimizations #142

chrisknoll · 2016-11-08T06:16:45Z

This PR addresses two issues: #139 and #141.

The major enhancement is cohort queries should operate faster in a MPP environment since we are now partitioning on person_id for the generated event_ids, and it should show improvement in SMP platforms with clustered indexes on person_id.

The fix for #139 was to include the _id in the order to tiebreak to fix the non-deterministic behavior of MPP platforms.

Added order by _date, _id to criteria queries to use the event ID as the tiebreaker. Fixes #139.

Optimized #primary_events and #qualified_events into single query, eliminating one temp table creation. Partitioning on person_id for event_ids allowing MPP architectures to leverage hashing on person_id. Fixes #141.

chrisknoll · 2016-11-08T14:00:15Z

Please hang on before approving this I might have found an unexpected issue.

Switched from qualified_events to included_events for end date selection.

chrisknoll · 2016-11-08T16:18:27Z

Ok, this is ready for review now. Thank you.

ericaVoss · 2016-11-08T17:09:46Z

Nice job adding them everyone. :)

chrisknoll · 2016-11-17T22:21:55Z

@pbr6cornell , @anthonysena @fdefalco this has been open for 10 days without a review. Please review and either approve or notify me of concerns.

anthonysena

Chris, I do not see anything concerning with these changes and am inclined to approve. That said, it would be useful to do some testing on a few cohorts in our test environment to be sure that we're not missing anything here. Your thoughts?

chrisknoll · 2016-11-18T03:27:54Z

Yes I ran internal tests using a version of Joel's cohort definition where the primary events were 'Any Visit' qualified with death events within 180 days after. you can see this definition here:
http://hix.jnj.com/atlas/#/cohortdefinition/3333

From a review perspective, I think it's just a matter of reviewing the files and making a local build and generating a few cohorts of your own to see if it looks like everything is square. I've done this myself, but looking for a PR reviewer.

anthonysena · 2016-11-18T14:09:23Z

OK - I'll pull this down and test it locally. Please let me know if there are any other cohorts that are good candidates for testing and I'll review those as well.

chrisknoll · 2016-11-18T15:42:21Z

@anthonysena , Thank you very much for taking this on. I'd say if you have any current cohort defs that you test with in hixbeta, you can copy them to a new definition (so to keep the old results) and then execute the copy under the new codebase and check that the numbers line up. That is what I did, but a second check would be good.

@generated

* add initial set of refactored class for summary service * Added SQL for summary service functionality. See issue OHDSI/issues/108 * imported SQL file from Achilles Export2JSON, add'l refactoring of CDMResultsService impl * added stub method for retrieval of drug and procedure reports to CDMResultsService * Demographic Criteria Implementation. * Removed BETWEEN usage in criteria A BETWEEN X and Y becomes A >= X and A <= Y. BETWEEN is not consistent between all db platforms. * Incidence Rate Analysis Implementation Implementation of Incidence Rate web services, tasklets and repositories and IR Analysis Report Generation. Added new dependency: apache commons collections v4.1. * Remove BETWEEN expression from numeric and date range criteria input types. * Add null check for correlated criteria end date. Existing cohorts will have this null, so a null check is required. * enhancements for person profile service and cohort comparison service * sql syntax error * Fix to allow more than 1 person per cohort. * Flyway deployment scripts for postgreSQL and Oracle. * adding outcome model retrieval and updates to comparative cohort analysis * Negative controls implementation modified: src/test/java/org/ohdsi/webapi/test/feasibility/StudyInfoTest.java * ConceptSet optimization and comparison utilities * ConceptSet utilties * Moving GenerationStatus enum * Delete concept sets per OHDSI/Atlas#87 * Fixing ir calc flyway script for postgreSql * sql cleanup * Flyway fixes for sql server * Fixed oracle migration scripts. * Added @generated to NegativeControlRecord. JPA will manage this entity's auto-number. * Fix IR Strata TAR. IR Strata was not calculating TAR in years. * resolve interface changes with r packages * logging * resolve missing placeholder * made several sql translation changes * moved mappers and model objects to report package in prep for refactoring. * added DTO to report package and stubbed out service and runner classes. * Use same method (getTreemap) for all (non-temporal) treemaps * Add generic row mapper, text hack impls * new drugService call: http://localhost:8080/WebAPI/OPTUM-PDW/drugexp/personId/33034143174. working as is, but planning to add some fields and allow drug_concept_id to be specified in addition to or instead of person_id * Meant to push this to fork and then make pull request. Reverting. Revert "new drugService call: http://localhost:8080/WebAPI/OPTUM-PDW/drugexp/personId/33034143174. working as is, but planning to add some fields and allow drug_concept_id to be specified in addition to or instead of person_id" This reverts commit 19fee59. * Reverting revert. Will push this to fork instead of ohdsi Revert "Meant to push this to fork and then make pull request. Reverting." This reverts commit 14e1f79. * made the same mistake twice. Revert "Reverting revert. Will push this to fork instead of ohdsi" This reverts commit 5b699ad. * Fix OHDSI#137 Updated criteria group query to left join to @eventTable to capture events that matched 0 of the criteria in the group. * Cohort def optimizations (OHDSI#142) * Make first occurrence of event deterministic. Added order by _date, _id to criteria queries to use the event ID as the tiebreaker. Fixes OHDSI#139. * Cohort Definition Query Optimization Optimized #primary_events and #qualified_events into single query, eliminating one temp table creation. Partitioning on person_id for event_ids allowing MPP architectures to leverage hashing on person_id. Fixes OHDSI#141. * Added proper partitioning on person and event for gain counts. * Switched from qualified_events to included_events for end date selection. * Fixes OHDSI#144. Removed ExceptionHandler from generate cohort step; it was masking the failed job completion status by returning 'COMPLETED' after failure. * Use same method (getTreemap) for all (non-temporal) treemaps * Add generic row mapper, text hack impls * Reformat cdmresults sql scripts * Use AS keyword in all cdm report query aliases * Remove text hack and camel case column names in cdm report queries * Replace all treemap endpoints with a single generic implementation * Rename all treemap CDM report files to treemap.sql * Enquote aliases in CDM report queries to preserve case * Add GenericRowMapper ctor with ObjectMapper for performance * CDMResultsAnalysisRunner.getTreemap returns ArrayNode * Fix condition CDM results queries (by concept) * include conceptId parameter * rename sqlAgeAtFirstDiagnosis to sqlAgeAtFirstOccurrence * modify sqlConditionsByType to check vocabulary_id, not domain_id * cast xCalendarYear to INT in sqlPrevalenceByGenderAgeYear * Fix queries associated with generic drilldown reports * Data types include conditionera, drug, drugera, measurement, observation, procedure, visit * Queries include AgeAtFirstOccurrence, PrevalenceByGenderAgeYear, ByType, ByMonth * Include parameter @conceptId * Cast xCalendarYear to INT (where applicable) * Remove obsoleted code from CDMResultsService (activates observation reports) * Standardize names of "by type" query files for cdm reports to sqlByType.sql Applies to condition, drug, measurement, observation, procedure * Add cdm_database_schema to CDMResultsAnalysisRunner standard columns (for measurement treemap) * Fix typo in sqlVisitDurationByType.sql * added fixes to row mapping in /person and dashboard * implemented /achillesheel, /datadensity started implementation of /death refactored CohortAttributeMapper --> CDMAttributeMapper created new report data objects drop unused/superfluous CDMResultsAnalysisRunner * Fixes for demographic criteria. (OHDSI#146) * Added correct fields to select to match UNION of other criteria queries. * Change IR Analysis strata queries to properly partition by person_id. * Updated sql to match SqlRender SqlRender does not support 'TOP' instruction, changed * HTTP verb normalization POST = create PUT = update Fixes OHDSI#147. * fixed /datadensity * IR Enhancments Calculate distribution of Time At Risk and Time to Outcome. Provide new export to zip function to export definition and raw data from analysis results. * Cohort Definition Correlated Criteria Extends cohort definition criteria to allow specification of a criteria group whose members must result in a true result in order for the criteria to be satisfied. * Allow censoring criteria in cohort definitions. * Gracefully handle empty criteria groups in cohort definitions. Fixes OHDSI#155 * cleaned up and changes /achillesheel --> /heel * Add cdm_database_schema to CDMResultsAnalysisRunner standard columns as per mark-velez/* #e862d53 * changed endpoint signature (/heel -->/achillesheel), fixed summary yob and yob in person report * fixed null totalRecords element in Data Density report * added summary to dashboard report in CDM results service * implemented death report

chrisknoll added 2 commits November 7, 2016 14:29

Make first occurrence of event deterministic.

385845e

Added order by _date, _id to criteria queries to use the event ID as the tiebreaker. Fixes #139.

Cohort Definition Query Optimization

c8984fb

Optimized #primary_events and #qualified_events into single query, eliminating one temp table creation. Partitioning on person_id for event_ids allowing MPP architectures to leverage hashing on person_id. Fixes #141.

Added proper partitioning on person and event for gain counts.

afced16

Switched from qualified_events to included_events for end date selection.

anthonysena reviewed Nov 17, 2016

View reviewed changes

anthonysena approved these changes Nov 19, 2016

View reviewed changes

chrisknoll merged commit 5c0916d into master Nov 19, 2016

chrisknoll deleted the cohort_def_optimizations branch November 20, 2016 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cohort def optimizations #142

Cohort def optimizations #142

chrisknoll commented Nov 8, 2016

chrisknoll commented Nov 8, 2016

chrisknoll commented Nov 8, 2016

ericaVoss commented Nov 8, 2016

chrisknoll commented Nov 17, 2016

anthonysena left a comment

chrisknoll commented Nov 18, 2016 •

edited

Loading

anthonysena commented Nov 18, 2016

chrisknoll commented Nov 18, 2016

Cohort def optimizations #142

Cohort def optimizations #142

Conversation

chrisknoll commented Nov 8, 2016

chrisknoll commented Nov 8, 2016

chrisknoll commented Nov 8, 2016

ericaVoss commented Nov 8, 2016

chrisknoll commented Nov 17, 2016

anthonysena left a comment

Choose a reason for hiding this comment

chrisknoll commented Nov 18, 2016 • edited Loading

anthonysena commented Nov 18, 2016

chrisknoll commented Nov 18, 2016

chrisknoll commented Nov 18, 2016 •

edited

Loading