-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/assessment normalizations #65
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an overall comment, these are minimal enough changes that I think it's fine to add to edu_wh
despite the fact that we will continue to add more assessment reporting/normalization features down the line. I think there are no really strong reasons to not add normalized score columns to the fact tables, even if we end up doing more normalization downstream in the future, since score normalization is typically the first ask
dedupe_results.score_result, | ||
coalesce(xwalk_score_value_thresholds.normalized_score_result::varchar, | ||
xwalk_score_values.normalized_score_result::varchar, | ||
score_result::varchar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked here whether or not the original score result should be defaulted to if there is no normalization happening for the score and leaned toward yes for the case when normalization is not necessary. What this could mean is that a score that should be normalized but isn't yet included in either normalization xwalk will make it's way into this column in an ugly format that might not match what is necessary for reporting, but in order for a column to be included here, it must be added to the xwalk_assessment_scores
column in the first place, so there is at least a manual step that needs to happen anyway. Someone might not know that this normalized column exists and the values in the normalized column should be an integer if it's a performance level (as a random example), but I think we can communicate this out and it avoids having to map values to themselves.
-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error | ||
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound | ||
and try_to_numeric(dedupe_results.score_result) <= xwalk_score_value_thresholds.upper_bound | ||
-- todo in future, may need to include subject & grade level in this join (with options to join across subjects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will definitely run into this at some point but can start without it - especially considering there will be additional assessment normalization features anyway
and xwalk_scores.normalized_score_name = xwalk_score_value_thresholds.normalized_score_name | ||
-- todo check these comparators -- what if there's a value between the upper and next lower? eg value is 20.4 and the cutoffs are 20 and 21 | ||
-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error | ||
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will default to int since no scale argument is given, I think that's fine but maybe we consider allowing for decimals (so try_to_decimal)? I assume you could still write out the values in the xwalk as integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, maybe we should be explicit about the data type of this column -- i'm still unsure about this Q I put in the PR "Is it right to overload a general "normalized_" column with various use cases, when some may need to be integers vs. characters, etc.?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good q. I think in a lot of cases though the point of a column like this is to normalize values to a similar set of values across all assessments in the table. I don't necessarily think that's always true but my guess is this column would be used for a single particular downstream purpose - like a BI user might use a normalized column where all PLs are integers when creating charts for proper ordering. But again, maybe there is another use case I'm not considering where this could have serious negative effects
{% set normalized_names_thresholds = dbt_utils.get_column_values(ref('xwalk_assessment_score_value_thresholds'), 'normalized_score_name') or [] %} | ||
{{ dbt_utils.pivot( | ||
'normalized_score_name', | ||
(normalized_names_values + normalized_names_thresholds) | unique, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea here is that we only want normalized versions of scores that are included in either xwalk (because scores like scale_score
and sem
will rarely be normalized in this way, so would be overkill in my opinion)
Overview
Assessment score results in the ODS are often not in the format needed to present in downstream dashboards, or in analytical queries. This branch adds the ability to create "normalized" columns in
fct_student_assessment
,fct_student_objective_assessment
, where each implementation can customize how they map these values. For example, you may want to map a "Percentile" result on a 1-100 scale to be a "performance_level" on a 1-5 scale. Or, you may want to add ordered integer values e.g. "Low -> 1; Medium -> 2; High -> 3".Description of Changes
bld_ef3__student_assessment_long_results
andbld_ef3__student_objective assessment_long_results
, addnormalized_score_result
to model, to allow for normalization of resultsfct_student_assessment
andfct_student_objective_assessment
, create new columnsnormalize_{score_name}
wherever they have been configured. e.g. if you add rows toxwalk_assessment_score_values.csv
forperformance_level
, a new columnnormalized_performance_level
will be created with normalized score resultsDependent on:
These xwalks added to implementation repo:
xwallk_assessment_score_values
xwalk_assessment_score_value_thresholds
xwalk_objective_assessment_score_values
xwalk_objective_assessment_score_value_thresholds
Example PR: https://github.com/edanalytics/stadium_txdemo/pull/9
Questions:
edu_wh
models the correct place for this kind of normalization?TODOs: