Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concept id 0 stratification report #72

Closed
fdefalco opened this issue Oct 28, 2015 · 7 comments
Closed

concept id 0 stratification report #72

fdefalco opened this issue Oct 28, 2015 · 7 comments

Comments

@fdefalco
Copy link
Contributor

provide a report to identify the list of source values that are mapped to concept id 0 with both total record count and unique patient count. the intent of this report will be to assist in determining mapping issues or areas to improve mapping.

@vojtechhuser
Copy link
Contributor

I have similar problem with 0s in IMEDs.

Would this report be part of Heel or some third component within Achilles?

@vojtechhuser
Copy link
Contributor

I created a new measure and new rule to address this.

The problem is integer restriction on the count_value. Percentage of unmapped rows can be for example 0.17%

see code below:


--insert into achilles_results(analysis_id,count_value)
with t1 as  (
      select analysis_id,sum(count_value) as all_cnt from achilles_results where analysis_id in (401,601,701,801,1801) group by analysis_id
    ),
t2 as (
            select analysis_id,count_value as concept_zero_cnt from achilles_results where analysis_id in (401,601,701,801,1801) and stratum_1 = 0
            )
--count of unmapped rows (analysis xxxx98)
select t2.analysis_id+97 as analysis_id,t2.Concept_zero_cnt as count_value  from t2
UNION
--percentage of unmapped rows (analysis xxx99)
select t1.analysis_id+98 as analysis_id,((1.0*concept_zero_cnt)/all_cnt)*100 as count_value from t1 left outer join t2 on t1.analysis_id = t2.analysis_id
;



--ruleid 27 error: significant portion of data is unmapped
--INSERT INTO @results_database_schema.ACHILLES_HEEL_results (
--INSERT INTO @results_database_schema.ACHILLES_HEEL_results (
--  analysis_id,    ACHILLES_HEEL_warning,  rule_id,    record_count    )
SELECT DISTINCT or1.analysis_id,
--  'ERROR: ' + cast(or1.analysis_id as VARCHAR) + '-' + oa1.analysis_name + '; count (n=' + cast(or1.count_value as VARCHAR) + ') should not be > 0' AS ACHILLES_HEEL_warning,
  'ERROR: more than 1 percent of unmapped rows (concept_0 rows)' as ACHILLES_HEEL_warning,
    27 as rule_id,
    or1.count_value
--FROM @results_database_schema.ACHILLES_results or1
FROM ACHILLES_results or1
--INNER JOIN @results_database_schema.ACHILLES_analysis oa1 ON or1.analysis_id = oa1.analysis_id
WHERE or1.analysis_id IN (499,699,799,899,1899) 
    AND or1.count_value > 1;



@chrisknoll
Copy link
Contributor

count_value stores counts of things, not %. If you are looking to create a report query where we want to show % of totals, then you use count_value of analysis ID of the unmapped rows as your numerator and then use the count_value of analysis_id of the rows in the table (that should be one of the analysis, counts of tables by domain), and that will give you a percentage.

@vojtechhuser
Copy link
Contributor

I thought that achilles heel is exactly that type of report that you describe in your last sentence. Is it not?

OK. I can just round the percentage to integers and that is not a problem.

NEW ISSUE:
I went after the source_values that are that appear in concept_0 rows. There are two options 1. just a rule or 2. new analysis and a rule. I would prefer option 2. Chriss - please see the code below and let me know what you are in favor of.

--rule_id 29  Count of unmapped distinct source_values is too high (warning)
--INSERT INTO @results_database_schema.ACHILLES_HEEL_results (analysis_id,  ACHILLES_HEEL_warning,  rule_id,    record_count    )
SELECT 425 as analysis_id,
  'WARNING: more than 100 distinct unmapped source_values' as ACHILLES_HEEL_warning,
    29 as rule_id,
    --add parametized schema prefix here
    (select count(distinct condition_source_value) from condition_occurrence where condition_concept_id = 0) as count_value
where 
    (select count(distinct condition_source_value) from condition_occurrence where condition_concept_id = 0) >100 
;   

This new analysis has the same "mirror image" (analogous measure) in Proc, Measurement, Observation, drug_exposure (e.g., 625,1825, etc) (a don't care what number we pick, (next up is x21 (so 421) so with 25 I skipped some possible IDs.

@vojtechhuser
Copy link
Contributor

this functionality is 99% implemented in the new huser_dev branch (with now a pull request to master)

@vojtechhuser
Copy link
Contributor

This is now in separate SQL file. A good approach is to probably merge this into the Heel SQL file.

@vojtechhuser
Copy link
Contributor

only total record count is implemented in analysis_id 1900 (the patient count would have to be new analysis)

@vojtechhuser vojtechhuser added this to the v1.3.0 milestone Jun 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants