Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/assessment normalizations #65

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rlittle08
Copy link
Collaborator

@rlittle08 rlittle08 commented Jul 12, 2023

Overview

Assessment score results in the ODS are often not in the format needed to present in downstream dashboards, or in analytical queries. This branch adds the ability to create "normalized" columns in fct_student_assessment, fct_student_objective_assessment, where each implementation can customize how they map these values. For example, you may want to map a "Percentile" result on a 1-100 scale to be a "performance_level" on a 1-5 scale. Or, you may want to add ordered integer values e.g. "Low -> 1; Medium -> 2; High -> 3".

Description of Changes

  • In bld_ef3__student_assessment_long_results and bld_ef3__student_objective assessment_long_results, add normalized_score_result to model, to allow for normalization of results
  • In fct_student_assessment and fct_student_objective_assessment, create new columns normalize_{score_name} wherever they have been configured. e.g. if you add rows to xwalk_assessment_score_values.csv for performance_level, a new column normalized_performance_level will be created with normalized score results

Dependent on:

These xwalks added to implementation repo:

  • xwallk_assessment_score_values
  • xwalk_assessment_score_value_thresholds
  • xwalk_objective_assessment_score_values
  • xwalk_objective_assessment_score_value_thresholds

Example PR: https://github.com/edanalytics/stadium_txdemo/pull/9

Questions:

  • Is "normalized" the right wording for these kind of customizations? I don't want to confuse "display values" with "re-scaled values", but there is overlap there
  • Is it right to overload a general "normalized_" column with various use cases, when some may need to be integers vs. characters, etc.?
  • Generally, are edu_wh models the correct place for this kind of normalization?
  • When should re-mapping of values live in student warehouse tables, vs. in dimensions or xwalks?

TODOs:

  • Work on a larger "assessments engine" feature that separates normalized info from assessment-specific info & efficiently serves those data to downstream purposes
  • In the future, we may need to allow these xwalks to join separately on subject, grade level, etc.

@rlittle08 rlittle08 requested a review from jalvord1 July 12, 2023 21:22
Copy link
Contributor

@jalvord1 jalvord1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an overall comment, these are minimal enough changes that I think it's fine to add to edu_wh despite the fact that we will continue to add more assessment reporting/normalization features down the line. I think there are no really strong reasons to not add normalized score columns to the fact tables, even if we end up doing more normalization downstream in the future, since score normalization is typically the first ask

dedupe_results.score_result,
coalesce(xwalk_score_value_thresholds.normalized_score_result::varchar,
xwalk_score_values.normalized_score_result::varchar,
score_result::varchar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked here whether or not the original score result should be defaulted to if there is no normalization happening for the score and leaned toward yes for the case when normalization is not necessary. What this could mean is that a score that should be normalized but isn't yet included in either normalization xwalk will make it's way into this column in an ugly format that might not match what is necessary for reporting, but in order for a column to be included here, it must be added to the xwalk_assessment_scores column in the first place, so there is at least a manual step that needs to happen anyway. Someone might not know that this normalized column exists and the values in the normalized column should be an integer if it's a performance level (as a random example), but I think we can communicate this out and it avoids having to map values to themselves.

-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound
and try_to_numeric(dedupe_results.score_result) <= xwalk_score_value_thresholds.upper_bound
-- todo in future, may need to include subject & grade level in this join (with options to join across subjects)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will definitely run into this at some point but can start without it - especially considering there will be additional assessment normalization features anyway

and xwalk_scores.normalized_score_name = xwalk_score_value_thresholds.normalized_score_name
-- todo check these comparators -- what if there's a value between the upper and next lower? eg value is 20.4 and the cutoffs are 20 and 21
-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will default to int since no scale argument is given, I think that's fine but maybe we consider allowing for decimals (so try_to_decimal)? I assume you could still write out the values in the xwalk as integers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, maybe we should be explicit about the data type of this column -- i'm still unsure about this Q I put in the PR "Is it right to overload a general "normalized_" column with various use cases, when some may need to be integers vs. characters, etc.?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good q. I think in a lot of cases though the point of a column like this is to normalize values to a similar set of values across all assessments in the table. I don't necessarily think that's always true but my guess is this column would be used for a single particular downstream purpose - like a BI user might use a normalized column where all PLs are integers when creating charts for proper ordering. But again, maybe there is another use case I'm not considering where this could have serious negative effects

{% set normalized_names_thresholds = dbt_utils.get_column_values(ref('xwalk_assessment_score_value_thresholds'), 'normalized_score_name') or [] %}
{{ dbt_utils.pivot(
'normalized_score_name',
(normalized_names_values + normalized_names_thresholds) | unique,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea here is that we only want normalized versions of scores that are included in either xwalk (because scores like scale_score and sem will rarely be normalized in this way, so would be overkill in my opinion)

@rlittle08 rlittle08 requested a review from ejoranlienea July 13, 2023 14:20
@rlittle08 rlittle08 marked this pull request as draft July 24, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants