-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlatability of usage patterns #12
Comments
I'd like to step out a few layers to relate this to "Identity" because, in effect we are talking about how verifiable claims will be used, intentionally or otherwise, to correlate individuals' identities from one context to another. It turns out that "identity" as used in technical discussions means two things, sometimes conflated in a subset as PII or "Personally Identifiable Information". Verifiable claims will be used for both of these and there are different nuances for each in terms of correlation. First, there are identifiers and attributes used explicitly for correlating individuals across contexts. Sometimes these identifiers are opaque, e.g., "anonymous" GUIDs used for tracking in third party cookies. Sometimes they are grounded in real-world identity systems such as legal name, driver's license #, etc. Second, attributes are used for customizing services, such as a zip code for movie listings or using a name in a greeting on a page, separate from whether or not that name or zip code is used to correlate the individual with anything beyond the presentation of features. Privacy issues occur when
The two biggest gotchas, in my experience, are
Sometimes, the notion of "identity" and PII is taken to include any and all attributes related to an individual, regardless of how those attributes are used. This helps deal with the reality that even seemingly innocuous data can be used for deanonymizing and correlating individuals, but it turns your "identity" into everything which is almost useless as input to engineering a good system. Rather than discussing this issue in terms of the title language "verifiable claims are used are important for correlation purposes", I would suggest that verifiable claims will be used to both
Implementers should understand and communicate to their users how their particular identity systems correlate individuals, e.g., from session to session and with third party services, and how they actively prevent or minimize undesired and unexpected correlation. The shift I'm going for here is that, contrary to a general push in the crypto world, correlation isn't inherently bad. In fact, correlation is the direct result of identification and "identity" is useful. While anti-correlation features of various technologies are great features, that focus has obscured the fact that there are times when we want and need to be correctly correlated with our rights and privileges. In a privacy respecting system, individuals would have maximum control over correlation, enabling intentional correlation where desired while preventing undesired correlations. The appropriate limits of that control are still up for debate, but every system using verifiable claims will necessitate design choices that impact how individuals are correlated across contexts. |
Privacy engineering can help sort through the different aspects of privacy in a use case. I’d like to relate this issue to the specific prescription use case that I maintain.
|
Thanks for the example, @agropper. It's a solid example of where Verifiable Claims can help with privacy. To help us move towards a more rigorous lexicon, I'd like to call this a "use domain" instead of a "use case." I'm hoping to establish a specific semantics for use cases:
I'm still working through the best alternative language, but "use domain" or "domain of use" seems like a good way to describe this example, which include several transactions, as well as domain-specific non-functional requirements, such as both the correlatability and non-correlatability you outline. From what I read, I tease out a few different transactions:
There may also be transactions related to the credential that enables a doctor to prescribe as well as recording pharmacy interactions: requesting fulfilment of a prescription, fulfilling a prescription, etc., so we can understand the needs of the audit. As with many of these kinds of use, the trick is defining the coherent boundary so we can focus on the new and interesting bits. For example, one could discuss how all of the entities in the domain provision their credentials: the monitoring agencys, the pharmacies, the pharmacists, the insurance companies (surprisingly missing from your example). Clearly, taking some of these entities (and their credentialing) as a given greatly simplifies the documentation. To try and tease out the correlatability: Intended correlations:
Blocked correlations:
Do these transactions and correlations seem correct? Questions:
|
@agropper @jandrieu Should we move @agropper's use case description to the use cases repository? This repository is about data model and specifically the privacy/correlatability section. We may want to split this discussion into two aspects: 1) The use case itself (put it in the use cases repo issue tracker), and 2) How this use case impacts the correlatability subsection in the privacy considerations section. |
Sounds good. I'll move the use domain over there for its own refinement. One thing became clear to me in working through Adrian's use domain is the need for a strawman architecture for these kinds of use domains so that we can evaluate the privacy impact. For example, the main page for the Tahoe-LAFS has a simple diagram distinguishing what parts of the architecture must be trusted implicitly and which rely instead on cryptographic trust. I'm reminded of Eben Moglen's testimony to congress in 2010:
So, in order to understand how Verifiable Claims addresses privacy issues, I believe we will need to consider how they would operate within the context of various systems, each of which will have distinct trust boundaries and differing needs for information access. Once we understand that, we can evaluate what the data model needs to support those use cases, and in particular, how verifiable claims improve privacy when used correctly. |
@jandrieu So, how do you suggest we proceed? In particular, is there anything you think might be productive to discuss in next week's call? |
Thanks @jandrieu You're structuring this is a useful way. The correlations and transactions seem correct. Questions: 2 - I agree with your framing. I don't know the legal answer to who does routine audits. I would allow for a separate registry no matter what. The pharmacy is handling controlled substances and subject to audit by the DEA. I'm skeptical of storing anything other than timestamps in a public data store. 3 - Good point. I think we need to do both. Keep in mind that some states will require the querying physician to have a relationship with the patient and others will simply require they be a licensed practitioner. 4 - The insurance may need to be consulted for decision support and/or costs before the prescription is finalized by the physician. The pharmacy also needs insurance access, unless the patient pays cash - which is allowed by law. Once we create the "use domain" representation, we would do well to add insurance. 5 - Maybe. The monitoring programs are run at state level and can include the pharmacy. some states also mandate that physicians check the registry before prescribing controlled substances and we could imagine transactions that warn the physician or regulators if this is not done. |
Created new issue in use case repo: w3c/vc-use-cases#38 |
Next steps is to do a privacy analysis on Adrian Gropper's use case listed in this issue. The people that volunteered are: @jandrieu @agropper @jonnycrunch @msporny @amigus |
Discussed in 24 Jan 2017 telecon (link to minutes when available) |
@jandrieu @agropper @jonnycrunch @msporny @amigus -- looking for an update on this issue. |
Note this has moved to the use cases repository: w3c/vc-use-cases#38 |
This issue was originally about writing 2-3 paragraphs for the Data Model specification under the Privacy section related to Correlatability. I think we're going a bit overkill here - the analysis that @jandrieu and @agropper are doing is useful, but we probably don't need to wait on that to write a section for the specification. We just need a general idea for what sort of things you could put in a Verifiable Claim that would correlate you and to what degree. What we need to resolve this issue is 2-3 paragraphs that describe why/how Verifiable Claims can lead to correlation based on usage patterns. |
@stonematt I'm going to lead the effort to put the use-case @agropper defined, through the NIST Privacy Risk Assessment Methodology based on NIST.IR.8062. I expect I'll need help from the other volunteers in the coming weeks. |
Regarding the original issue of Correlatability: I see the issue as maintaining control over the selective disclosure of verifiable claims that may lead to Correlatability and understanding/accepting that risk. The Rx use-case is interesting and much too complex, because face it, when we are sick and/or dying we will give up our souls for the chance to live longer. Perhaps, this thread should continue with the more general discussion of Correlatabilty. More specific the use of Correlatability or K-means clustering of certain attributes for potential re-identification. The classic example was the re-identification of Gov. Weld from 'de-identified' patient discharge information in Massachusetts. In medicine, with regard to the potential re-identification of "de-identified" Personally Identifiable Information we often use k-anonymity methods to preserve privacy, whereas certain attributes are translated to more general ranges. Female form zip-code 37203 age 29 is translated to Female from area 372**, <20 age <=30 depending on the data set and calculated. I didn't find the NIST, IR 8062 very useful as it only described the typical governmental mantra of "Monitor -- Assess -- Respond" and policy responses. The worksheet would be helpful to create the strawman discussion. Although the verbage and concepts were helpful, the privacy risk equation wasn't terrible helpful. Rather, I think we should model the risk of re-identificaiton given the attributes and have a cohesive methodology that we use for each scenario/use-case. (For instance, the 29 year old female in 37203 who is buying alcohol or filling a Rx). The context of how many people live in area code 37203 and in that age range is about 30k. So rather than focus on the characteristics of a secure system as suggested in the PRAM, i think we should focus on the methods to calculate the risk for re-identification given the context of each of the selective disclosure of attributes. |
I'm trying to make the Rx use-case simple enough to enable a bit of privacy
analysis with respect to correlation. A new version at
https://docs.google.com/document/d/1l4d1_gvMeljbhCbWhxKpaPOk3A9-05sD3vD1V2PnNog/edit
is simpler and may lend itself to a focused discussion.
As far as de-identification, I find the topic is used mostly to justify
lack of transparency in how personal data is used. The combination of two
trends: many more public data sets plus more accessible machine
intelligence, makes re-identification risk calculation plausible for only
the most trivial of cases.
Adrian
…On Tue, Feb 14, 2017 at 10:50 AM, jonnycrunch ***@***.***> wrote:
Regarding the original issue of Correlatability: I see the issue as
maintaining control over the selective disclosure of verifiable claims that
may lead to Correlatability and understanding/accepting that risk. The Rx
use-case is interesting and much too complex, because face it, when we are
sick and/or dying we will give up our souls for the chance to live longer.
Perhaps, this thread should continue with the more general discussion of
Correlatabilty. More specific the use of Correlatability or K-means
clustering of certain attributes for potential re-identification. The
classic example was the re-identification of Gov. Weld from 'de-identified'
patient discharge information in Massachusetts. In medicine, with regard to
the potential re-identification of "de-identified" Personally Identifiable
Information we often use k-anonymity methods to preserve privacy, whereas
certain attributes are translated to more general ranges. Female form
zip-code 37203 age 29 is translated to Female from area 372**, <20 age <=30
depending on the data set and calculated. I didn't find the NIST, IR 8062
very useful as it only described the typical governmental mantra of
"Monitor -- Assess -- Respond" and policy responses. The worksheet would be
helpful to create the strawman discussion. Although the verbage and
concepts were helpful, the privacy risk equation wasn't terrible helpful.
Rather, I think we should model the risk of re-identificaiton given the
attributes and have a cohesive methodology that we use for each
scenario/use-case. (For instance, the 29 year old female in 37203 who is
buying alcohol or filling a Rx). The context of how many people live in
area code 37203 and in that age range is about 30k. So rather than focus on
the characteristics of a secure system as suggested in the PRAM, i think we
should focus on the methods to calculate the risk for re-identification
given the context of each of the selective disclosure of attributes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIeYV4-p8WQWNUOAHniXkk0EDPNI13Tks5rcc1WgaJpZM4K92Lm>
.
--
Adrian Gropper MD
PROTECT YOUR FUTURE - RESTORE Health Privacy!
HELP us fight for the right to control personal health data.
DONATE: http://patientprivacyrights.org/donate-2/
|
I just made a pull request with specific language. One challenge I had was distinguishing the privacy of the holder separate from the subject. It's easier to discuss and reason about when the holder is presumed to be the subject, but that presumption is knowably false when dealing with delegated or guardian holders, e.g., claims about a child. Unfortunately, it also should probably never be presumed that the holder of a claim is guaranteed to be the subject of the claim. Maybe I'm missing something here, but I don't think we've teased out the issues of identity assurance between the
My understanding is that, for example, a parent could present to a DID and assert that DID applies to an individual who they claim is their child. In this case, all three of the above listed parties are different. I haven't seen any language addressing how we deal with the presenter's assertion that the subject of the claim is any particular individual, including themselves. I realize some of this is protocol related and potentially out of scope, but I found my own language challenging to reconcile with the ambiguous relationship between the holder, the presenter, and the subject. |
I'm confused by the terminology.
- Parent A presents Prescription for Sibject child B to Pharmacy.
- A has a DID and B has a DID.
- I don't see a third party.
- The parent needs to show a driver's license because the Prescription is
for a controlled substance.
- The custodial relationship between A and B needs to be verified.
Adrian
On Wed, Feb 22, 2017 at 12:39 PM Joe Andrieu ***@***.***> wrote:
I just made a pull request with specific language.
One challenge I had was distinguishing the privacy of the holder separate
from the subject. It's easier to discuss and reason about when the holder
is presumed to be the subject, but that presumption is knowably false when
dealing with delegated or guardian holders, e.g., claims about a child.
Unfortunately, it also should probably never be presumed that the holder of
a claim is guaranteed to be the subject of the claim. Maybe I'm missing
something here, but I don't think we've teased out the issues of identity
assurance between the
1. the holder of a claim (the digital holder, who has the JSON-LD or
other serialization),
2. the presenter of a claim (the party actually asserting the claim to
an inspector)
3. the subject of the claim
My understanding is that, for example, a parent could present to a DID and
assert that DID applies to an individual who they claim is their child. In
this case, all three of the above listed parties are different. I haven't
seen any language addressing how we deal with the presenter's assertion
that the subject of the claim is any particular individual, including
themselves.
I realize some of this is protocol related and potentially out of scope,
but I found my own language challenging to reconcile with the ambiguous
relationship between the holder, the presenter, and the subject.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIeYdRyYU170LOj3WQOXMbW4hisAKb7ks5rfHK4gaJpZM4K92Lm>
.
--
Adrian Gropper MD
PROTECT YOUR FUTURE - RESTORE Health Privacy!
HELP us fight for the right to control personal health data.
DONATE: http://patientprivacyrights.org/donate-2/
|
@agropper It would be really helpful to align terms, such as "patient" with equivalent (or new, if necessary) Verifiable Claims terms (e.g. "entity"). It would also be helpful to surface all the implicit relationships in the prescription use case (e.g. the verifying employee at the pharmacy is authorised by the pharmacy to certify a prescription as being valid). |
Agreed. The format of the document is not ideal for a privacy analysis
because the relationships are hard to follow. I tried to highlight the
issues in my comment on #18
A more general format for how to deal with this eludes me but I suspect we
need a couple of examples for privacy just like we have a couple of
examples for how to code a claim. Simply listing 18 separate issues in
sections 5 and 6 doesn't do it for me.
Adrian
…On Sat, Mar 11, 2017 at 10:49 PM, David Wood ***@***.***> wrote:
@agropper <https://github.com/agropper> It would be really helpful to
align terms, such as "patient" with equivalent (or new, if necessary)
Verifiable Claims terms (e.g. "entity").
It would also be helpful to surface all the implicit relationships in the
prescription use case (e.g. the verifying employee at the pharmacy is
authorised by the pharmacy to certify a prescription as being valid).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIeYbfrQe2b2RKhR6hJCkldI-A25L97ks5rk2tagaJpZM4K92Lm>
.
--
Adrian Gropper MD
PROTECT YOUR FUTURE - RESTORE Health Privacy!
HELP us fight for the right to control personal health data.
DONATE: http://patientprivacyrights.org/donate-2/
|
@jandrieu did a PR for this issue and it was accepted into the spec. Closing the issue. |
When and how verifiable claims are used is important for correlation purposes. It is important that implementers are aware and may warn their customers of correlation patterns.
The text was updated successfully, but these errors were encountered: