MDS Data Redaction

DRAFT: A page about data redaction principles used in the Provider Reports and Metrics areas of MDS.

The MDS 1.1.0 release introduces two new features, Provider Reports and Metrics, that redact low aggregated counts of data.

For aggregated Reports and Metrics data, what “low count” values should be removed to protect rider privacy?

Data Redaction

Common practice to remove small counts of individuals from aggregated datasets, eg, census, health departments

K-value and K-anonymity

The value at which you do not share any data, eg, 5, 7, 10, etc. “K” just means a variable you can set, like ‘x’ in algebra.

During learning phase: lean towards lower risk and greater anonymization.

Factors in Scenario Variability

Low k-values mean more information, but higher risk. High k-values mean less information, but lower risk. We have an idea of the risk, but it changes greatly based based on scenarios.

Some factors that affect both risk exposure and requirements for more granular data:

Geography size (parking, no ride, equity zone, operating areas)
Population density (dense, sparse, residential, commercial)
Time frame (month, week, day, hour)
Data consumer/audience (internal, research, public)
Policy reason (enforcement, equity, operations)
Special groups data (all riders, low income)

MDS Risk Variability

Based on scenario variability and the dynamic nature of how Reports and Metrics work with MDS data, we recommend a lower risk k-value of 10 during the learning beta period until we get real-world feedback.

Providing Feedback

Discussion area

Questions

Set now as a value of 10. Should it be a range instead? No basis for ranges defined, 10 is a low risk value. Should there be different values for different scenarios? Not sure how to define basis for all scenario combinations. Should we show 0 count values separately? Some other anonymity methods do this, but for our case 0 and <10 should not be distinguishable.

omf-mds-github-footer