Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple significant security vulnerabilities in the design of data integrity #272

Closed
tplooker opened this issue Jun 21, 2024 · 149 comments
Closed
Assignees
Labels
CR1 This item was processed during the first Candidate Recommendation phase. normative This issue or PR will trigger the need for another Candidate Recommendation Snapshot pr exists A pull request exists to address this issue.

Comments

@tplooker
Copy link

tplooker commented Jun 21, 2024

The following issue outlines two significant security vulnerabilities in data integrity.

For convenience in reviewing the below content here is a google slides version outlining the same information.

At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.

The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the @protected keyword doesn't cover terms that are defined using the @vocab and @base keywords. This means any terms defined using @vocab and @base are vulnerable to term redefinition.

The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the @context element of the JSON-LD document. The fact that the @context element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.

Please see the attached presentation for resolutions to this issue we have explored.

In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the @context element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.

I'm also happy to present this analysis to the WG if required.

(edited to put backticks around @words)

@msporny msporny added the CR1 This item was processed during the first Candidate Recommendation phase. label Jun 21, 2024
@dlongley
Copy link
Contributor

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

We might want to add some examples to the specification that show that the information in documents can be expressed in one context and transformed into another. This could include showing an incoming document that is expressed using one or more contexts that the consumer does not understand, which can then be transformed using the JSON-LD API to a context that is trusted and understood. This would also help highlight the power of protection mechanisms that enable this kind of transformation.

For example, a VC that includes terms that are commonly consumed across many countries and some that are region specific. By using the JSON-LD API, a consumer that only understands the global-only terms can apply such a context to ensure that the terms they understood will appear as desired and other region-specific terms are expressed as full URLs, even when they do not understand or trust the regional context. All of this can happen without losing the ability to check the authenticity of the document.

We can also highlight that simpler consumers continue to be free to outright reject documents that are not already presented in the context that they trust and understand, no matter their authenticity.

@tplooker
Copy link
Author

tplooker commented Jun 21, 2024

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

The fundamental point of digital signatures is to reduce the information that needs to be trusted prior to verification. Most modern technologies e.g SD-JWT, mDocs, JWT and COSE and JOSE at large do this successfully meaning a relying party only needs to trust a public key prior to attempting to verify the signature of an otherwise untrusted payload. If the signature check fails, the payload can be safely discarded without undue expense.

The problem with data integrity is that this assumption is not the same. In essence the relying party doesn't just need the public key of the issuer/signer, but also all possible JSON-LD context entries that issuer may or may not use, if any of these are corrupted, manipulated or untrusted ones injected, the attacks highlighted in this issue become possible. Whether it is even possible to share these contexts appropriately at scale is another question, but these attacks demonstrate at a minimum that an entirely unique class of vulnerabilities exist because of this design choice.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

The point im making is not about whether one should understand the context of a message it has received or not, its about when it should attempt to establish this context. Doing this prior to validating the signature is dangerous and leads to these vulnerabilities.

For instance a JSON-LD document can be signed with a plain old JWS signature (like in JOSE COSE), once the signature is validated one can then process it as JSON-LD to understand the full context, if they so wish. The benefit of this approach is that if the JSON-LD context have been manipulated (e.g the context of the message), the relying party will have safely discarded the message before even reaching this point because the signature check will have failed. Data integrity on the other hand requires this context validation to happen as a part of signature verification thus leading to these issues.

@selfissued
Copy link
Contributor

Another take on this is that Data Integrity signing methods that sign the canonicalized RDF derived from JSON-LD, rather than the JSON-LD itself, enable multiple different JSON-LD inputs to canonicalize to the same RDF. The JSON-LD itself isn't secured - only RDF values derived from it. If only the derived RDF values were used by code, it might not be a problem, but in practice, code uses the unsecured JSON-LD values - hence the vulnerabilities.

@ottonomy
Copy link

In the example where the firstName and middleName plaintext properties are swapped, what should the verifier's behavior be? I don't think it should just be to verify the credential, whatever type it might be and then look at the plaintext properties within it that used @vocab-based IRIs. If I were writing this verifier, I would also ensure the @context matched my expectations, otherwise I wouldn't be sure that the properties of credentialSubject I was looking for actually meant the things that I expected them to mean.

If they were trying to depend on a credential of a certain type that expressed a holder's first name and middle name, it would not be a good idea to miss a check like this. Don't accept properties that aren't well-@protected in expected contexts. This is an additional cost that comes with processing JSON-LD documents like VCDM credentials, but it's not a step that should be skipped, because you're right that skipping it might open an implementer up to certain vulnerabilities.

Approaches that work:

  1. Verify the @context matches your expectations, such as including only known context URLs to contexts appropriate for the credential type that use @protected and only use explicitly defined terms.
  2. OR, use the JSON-LD tools to compact the credential into the context you expect before relying on plaintext property names.

Communities developing and using new credential type specifications benefit from defining a good @context with appropriately @protected terms. @vocab is ok for experimentation but not so great for production use cases. We don't really have a huge number of credential types yet, but hopefully as the list grows, the example contexts established for each of the good ones makes for an easy-to-follow pattern.

@OR13
Copy link
Contributor

OR13 commented Jun 22, 2024

schema.org and google knowledge graph both use @vocab.

https://developers.google.com/knowledge-graph

The problem is not JSON-LD keywords in contexts, the problem is insecure processing of attacker controlled data.

If you want to secure RDF, or JSON-LD, it is better to sign bytes and use media types.

You can sign and verify application/n-quads and application/ld+json, in ways that are faster and safer.

W3C is responsible for making the web safer, more accessible and more sustainable.

Data integrity proofs are less safe, harder to understand, and require more CPU cycles and memory to produce and consume.

They also create a culture problem for RDF and JSON-LD by attaching a valuable property which many people care deeply about (semantic precision and shared global vocabularies), with a security approach that is known to be problematic, and difficult to execute safely.

These flaws cannot be corrected, and they don't need to be, because better alternatives already exist.

W3C, please consider not publishing this document as a technical recommendation.

@msporny
Copy link
Member

msporny commented Jun 22, 2024

2024-05-08 MATTR Responsible Disclosure Analysis

On May 8th 2024, MATTR provided a responsible security disclosure to the Editor's of the W3C Data Integrity specifications. A private discussion ensued, with this analysis of the disclosure provided shortly after the disclosure and a public release date agreed to (after everyone was done with the conferences they were attending through May and June). The original response, without modification, is being included below (so language that speaks to "VC Data Model" could be interpreted as "VC Data Integrity" as the original intent was to file this issue against the VC Data Model specification).

The disclosure suggested two separate flaws in the Data Integrity specification:

  • "The unprotected term redefinition vulnerability"
  • "The @context substitution vulnerability"

The Editors of the W3C Data Integrity specification have performed an analysis of the responsible security disclosure and provide the following preliminary finding:

Both attacks are fundamentally the same attack, and the attack only appears successful because the attack model provided by MATTR presumes that verifiers will allow fields to be read from documents that use unrecognized @context values. Two documents with different @context values are different documents. All processors (whether utilizing JSON-LD processing or not) should treat the inbound documents as distinct; the software provided by MATTR failed to do that. Secure software, by design, does not treat unknown identifiers as equivalent.

That said, given that a credential technology company such as MATTR has gone so far as to report this as a vulnerability, further explanatory text could be added to the VC Data Model specification that normatively state that all processors should limit processing to known and trusted context identifiers and values, such that developers do not make the same mistake of treating documents with differing @context values as identical prior to verification.

The rest of this document contains a more detailed preliminary analysis of the responsible disclosure. We thank MATTR for the time and attention put into describing their concerns via a responsible security disclosure. The thorough explanation made analysis of the concerns a fairly straightforward process. If we have made a mistake in our analysis, we invite MATTR and others to identify the flaws in our analysis such that we may revise our findings.

Detailed Analysis

A JSON-LD consumer cannot presume to understand the meaning of fields in a JSON-LD document that uses a context that the consumer does not understand. The cases presented suggest the consumer is determining the meaning of fields based on their natural language names, but this is not how JSON-LD works, rather each field is mapped to an unambiguous URL using the JSON-LD context. This context MUST be understood by the consumer; it cannot be ignored.

A verifier of a Verifiable Credential MUST ensure that the context used matches an exact well-known @context value or MUST compact the document using the JSON-LD API to a well-known @context value before further processing the data.

Suggested Mitigation 1
Add a paragraph to the Data Integrity specification that mentions this and links to the same section in the Verifiable Credentials specification to help readers who are not familiar with JSON-LD, or did not read the JSON-LD specification, to understand that `@context` cannot be ignored when trying to understand the meaning of each JSON key. Additional analogies could be drawn to versioning to help developers unfamiliar with JSON-LD, e.g., "The `@context` field is similar to a `version` for a JSON document. You must understand the version field of a document before you read its other fields."

The former can be done by using JSON schema to require a specific JSON-LD shape and specific context values. This can be done prior to passing a document to a data integrity implementation. If contexts are provided by reference, a document loader can be used that resolves each one as "already dereferenced" by returning the content based on installed context values instead of retrieving them from the Web. Alternatively, well-known cryptographic hashes for each context can be used and compared against documents retrieved by the document loader over the Web. For this approach, all other JSON-LD documents MUST be rejected if they do not abide by these rules. See Type-Specific Credential Processing for more details on this:

https://www.w3.org/TR/vc-data-model-2.0/#type-specific-credential-processing.

This former approach is less powerful than using the JSON-LD Compaction API because it requires more domain-specific knowledge to profile down. However, it is still in support of decentralized extensibility through use of the JSON-LD @context field as a decentralized registry, instead of relying on a centralized registry. Decentralized approaches are expected to involve a spectrum of interoperability and feature use precisely because they do not require a one-size fits all approach.

Applying these rules to each case presented, for case 1:

A verifier that does not use the JSON-LD API and does not recognize the context URL, https://my-example-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the context URL, https://my-example-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context, and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

For case 2:

A verifier that does not use the JSON-LD API and does not recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context (and optionally, https://my-original-context.com), and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

Note: While the disclosure suggests that the JSON-LD @protected feature is critical to this vulnerability, whether it is used, or whether a Data Integrity proof is used to secure the Verifiable Credential, is orthogonal to ensuring that the entire @context value is understood by the verifier. For clarity, this requirement stands even if an envelope-based securing mechanism focused on syntax protection were used to ensure authenticity of a document. Misunderstanding the semantics of an authentic message by ignoring its context is always a mistake and can lead to unexpected outcomes.

Comparison to JSON Schema

The scenarios described are identical in processing systems such as JSON Schema where document identifiers are used to express that two documents are different. A JSON document with differing $schema values would be treated as differing documents even if the contained data appeared identical.

Original document

{"$schema": "https://example.com/original-meaning",
 "firstName": "John"}

New or modified document

{"$schema": "https://example.com/new-meaning",
 "firstName": "John"}

Any document processor whether utilizing JSON Schema processing or not would rightly treat these two documents as distinct values and would seek to understand their values equivalence (or lack of it) prior to processing their contents. Even consuming a document that is recognized as authentic would be problematic if the $schema values were not understood.

Original meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/original-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name by which a person is generally called: 'given name'",
      "type": "string"
    }
  }
}

New meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/new-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name spoken first in Japan: typically a surname",
      "type": "string"
    }
  }
}

Demonstration of Proper Implementation

The attack demonstration code provided adds the unknown modified/malicious contexts to the application code's trusted document loader. A valid application should not do this and removing these lines will cause the attack demonstrations to no longer pass:

documentLoader.addStatic("https://my-example-context.com/", modifiedContext)

https://gist.github.com/tplooker/95ab5af54a141b69b55d0c2af0bc156a#file-protected-term-redefinition-attack-js-L38

To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/93c0ba17b25e500d72c1ad131fe7e869

documentLoader.addStatic("https://my-malicious-modified-context.com/", modifiedContext)

https://gist.github.com/tplooker/4864ffa2403ace5637b619620ce0c556#file-context-substitution-attack-js-L48

To see "Proof failed" when this line is commented out and the failure result is logged, see:

https://gist.github.com/dlongley/4fb032c422b77085ba550708b3615efe

Conclusion

While the mitigation for the misimplementation identified above is fairly straightforward, the more concerning thing, given that MATTR is knowledgeable in this area, is that they put together software that resulted in this sort of implementation failure. It demonstrates a gap between the text in the specification and the care that needs to be taken when building software to verify Verifiable Credentials. Additional text to the specification is needed, but may not result in preventing this sort of misimplementation in the future. As a result, the VCWG should probably add normative implementation text that will test for this form of mis-implementation via the test suite, such as injecting malicious contexts into certain VCs to ensure that verifiers detect and reject general malicious context usage.

Suggested Mitigation 2
Add tests to the Data Integrity test suites that are designed to cause verifiers to abort when an unknown context is detected to exercise type-specific credential processing.

@OR13
Copy link
Contributor

OR13 commented Jun 22, 2024

If you consider the contexts part of source code, then this sort of attack requires source code access or misconfiguration.

Validation of the attacker controlled content prior to running the data integrity suite, might provide mitigation, but at further implementation complexity cost.

Which increases the probability of misconfiguration.

A better solution is to verify the content before performing any JSON-LD (or other application specific) processing.

After verifying, schema checks or additional business validation can be performed as needed with assurance that the information the issuer intended to secure has been authenticated.

At a high level, this is what you want:

  1. minimal validation of hints
  2. key discovery & resolution
  3. verification
  4. validation
  5. deeper application processing

Most data integrity suites I have seen do this instead:

  1. deep application processing (JSON-LD / JSON Schema)
  2. canonicalization (RDF)
  3. verification
  4. validation
  5. deeper application processing

The proposed mitigations highlight, that these security issues are the result of a fundamental disagreement regarding authentication and integrity of data.

Adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

Any application processing that occurs prior to verification is a design flaw, doubling down on a design flaw is not an effective mitigation strategy.

@filip26
Copy link

filip26 commented Jun 22, 2024

@OR13

adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

which is loop and simple string comparison. I don't see a reason for any of the exploits you have listed here except an implementer's incompetence.

Please can you elaborate how those exploits could be performed and provide a calculation, an estimation, how much this adds to complexity?

Thank you!

@tplooker
Copy link
Author

tplooker commented Jun 22, 2024

@filip26, setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

Your specific pseduo-code is in-sufficient for at least the following reasons:

  1. What is actually input into the cryptographic verification procedure for data integrity aren't URL's, its the contents behind those URL's. So to put it plainly, data integrity cannot by design ensure that the contents of the @context entries used to actually verify a credential are those that the issuer used, because they are not integrity protected by the issuers signature.
  2. You have disregarded inline contexts, the @context array is not simply guaranteed to be an array of strings it may also include objects or "inline contexts".
  3. Your check appears to imply ACCEPTED_CONTEXTS is a flat list of contexts acceptable for any issuer, this means if contexts from different issuers collide in un-expected ways and a malicious party knows this, they can manipulate pre-trusted @context values by the relying party without even having to inject or modify an existing @context. If I'm mistaken and you meant that ACCEPTED_CONTEXTS is an array of issuer specific accepted contexts, then please explain how this is accomplished in an interoperable manner and or how it would scale.

(edited to put backticks around @words)

@filip26
Copy link

filip26 commented Jun 22, 2024

@tplooker setting aside you are putting words in my mouth that I have not said which is quite rude and disrespectful ...

add 1. you are wrong, by ensuring data is processed with a context you accept (the URLs) you know what is behind those URLs, and how much you trust those URLs, and perhaps you have a static copy of the contexts. If you follow untrusted URLs then it's an implemters fault. Use a browser analogy.
add 2. yeah, I've simplified that, an inline context is a bad practice
add 3. your trust the URLs or not and based on the trust you proceed or not

@PatStLouis
Copy link
Contributor

I was browsing through past issues related to this. This specific issue was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@tplooker given these new findings, would you revise your support since this was a bad recommendation introducing a security concern according to your disclosure?

@OR13
Copy link
Contributor

OR13 commented Jun 22, 2024

The URL for a context doesn't actually matter... In fact some document loaders will follow redirects when resolving contexts over a network (technically another misconfiguration).

Depending on the claims you sign, you may only detect a mismatch in the signature, when you attempt to sign a document that actually uses the differing part of the context.

Contexts are just like any other part of source code... Every single line of source code is a potential problem.

You often don't control what 3rd parties will consider the bytes of a context to be... It's a feature, that's been turned into a defect by where it was placed.

"It verified for me, must be a problem in your document loader."

"I thought I would be able to fix it in only a few hours, but it took me 2 days and delayed our release"

"I finally figured out how data integrity proofs work, thanks for letting me spend all week on them"

I've paired with devs and shown them how to step through data integrity proofs, dumping intermediate hex values and comparing against a "known good implementation", only later to learn the implementation had a bug...

Misconfiguration is common in complex systems.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Authentication of json -> json web signatures
Specification of json structure -> json schemas
Integrity protection of files -> hashes
Semantic mapping for json -> JSON-LD

The essence of a recommendation, is that you believe there isn't a better alternative.

@filip26
Copy link

filip26 commented Jun 22, 2024

@OR13 I'm sorry but don't see it. You mention two issues: misconfiguration and bugs. Well, we have tests, certification, etc. Those issues are endemic to any software applications but we don't call all the software vulnerable because of just a possibility that there might be a bug but after we find a bug.

Misconfiguration is common in complex systems.

I would really like to see the complexity estimated. I guess we are seeing a very different picture.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Please let's be factual, what experts, what was recommended, etc. In EU when press article starts with a title "American scientists have ... " everyone stops reading it (they add the American to make it credible ;)

@OR13
Copy link
Contributor

OR13 commented Jun 22, 2024

@PatStLouis you raise an excellent point regarding default vocabularies.

It's never too late to change what's in a context (joke).

This working group cannot prevent anyone else from adding a context that includes a vocab.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

If json compatibility isn't a requirement, the working group can drop the vc-jose-cose spec and remove the vocab from the default context... This might even improve adoption of data integrity while clarifying that RDF is the claims format that W3C secures.

I've argued this point previously.

@tplooker
Copy link
Author

tplooker commented Jun 23, 2024

I was browsing through past issues related to this. w3c/vc-data-model#953 was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

(edited to put backticks around @words)

@tplooker
Copy link
Author

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here digitalbazaar/jsonld.js#199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it digitalbazaar/jsonld.js#452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

@msporny
Copy link
Member

msporny commented Jun 23, 2024

@tplooker wrote:

setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

@tplooker wrote:

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

Please stop insinuating that people are acting in bad faith.

Now might be a good time to remind everyone in this thread thread that W3C operates under a Code of Ethics and Professional Conduct that outlines unacceptable behaviour. Everyone engaging in this thread is expected to heed that advice in order to have a productive discussion that can bring this issue to a close.

(edited to put backticks around @words)

@veikkoeeva
Copy link

veikkoeeva commented Jun 23, 2024

From an implementer perspective maybe adding an example that "should fail" could be a good thing. Something like at #272 (comment) .

As an implementation "case experience", I implemented in .NET something that produces a proof like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6 the university crendetial and then also verifies it. It felt a bit tedious to find out what to canonicalize, hash and and sign to get a similar result. The code is or less private code still, but now that https://github.com/dotnetrdf/dotnetrdf/releases/tag/v3.2.0 and the canonicalization is publicly released, I might make something more public too. I still feel I need to go through this thread with more thought so I completely understand the issue at hand.

@msporny
Copy link
Member

msporny commented Jun 23, 2024

@veikkoeeva wrote:

From an implementer perspective maybe adding an example that "should fail" could be a good thing.

Yes, that is already the plan for the test suite in order to make sure that no conformant implementations can get through without ensuring that they refuse to generate a proof for something that drops terms, and/or, depending on the outcome of this thread, use @vocab to expand a term.

That's a fairly easy thing that this WG could do to ensure that this sort of implementation mistake isn't made by implementers. Again, we'll need to see how this thread resolves to see what actions we can take with spec language and test suites to further clarify the protections that we expect implementations to perform by default.

@PatStLouis
Copy link
Contributor

PatStLouis commented Jun 23, 2024

@OR13

It's never too late to change what's in a context (joke).

I'm not suggesting a change, my goal is to understand why this recommendation was suggested in the first place and removing it is now listed as a remediation step to a security concern raised from the very same parties who suggested it.

This working group cannot prevent anyone else from adding a context that includes a vocab.

Correct, @vocab is a great feature for some use cases. I enjoy the feature for learning about jsonld, development and prototyping until I publish a proper context. I wouldn't use it in a production system (or at least I haven't found a use case that requires it).

Many protocols have features that can be unsecured depending how you use them. This doesn't make the protocol inherently flawed.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

Apologies if you misunderstood my statement, but my intention was not to report an architectural flaw.

@tplooker

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

Yes @vocab is a useful feature, but should it always be present? Nothing is stopping someone from using it, it's a feature (but shouldn't be a default behaviour). I would argue the same that the decision of adding an @vocab in the base context of the vcdm 2.0 is a band aid solution in itself, derived from a need for easier development process.

The Data Integrity spec provides hashes for their entries that verifiers can leverage while caching the content. AFAIK this is already a thing.

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here digitalbazaar/jsonld.js#199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it digitalbazaar/jsonld.js#452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Thank you for pointing out to these issues, I enjoy looking back at historical data from before my time in the space. As pointed earlier, some of the parties that made that recommendation are now recommending removing it as a remediation to a security concerned that they raised.

The use cases listed for this recommendation was for development purposes as described in #953. Furthermore, the private claims section of the jwt RFC reads as follows:

Private Claim Names are subject to collision and should be used with caution.

Enabling this by default does not sound like a good recommendation to me.

It's easy to setup a context file, it takes 5 minutes and a github account. If you are doing development, you can just include an @vocab object in your base context for the short term, why the recommendation to make it part of the VCDM 2.0 context?

Regardless this was already discussed by the group and the decision has been made.

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

For concluding I find this disclosure valuable as I got to learn a bit more about json-ld and gives a great resource to demonstrate implementers how to properly conduct verification of credentials + issuers how to properly design a VC.

(edited to put backticks around @words)

@awoie
Copy link

awoie commented Jun 24, 2024

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

I would actually classify those attacks as "Data Integrity Signature Wrapping" (DISW) attacks. They share many similarities with XML Signature Wrapping Attacks (XSW) that occurred in the past. Also, note that it is possible to use XML Signatures securely if appropriate mitigations are implemented correctly. The same holds true for DI. The question is where we would add requirements for those additional mitigations for Data Integrity Proofs (DI).

The VCDM uses relatedResource to protect the integrity of specific external resources, such as @context values referenced by URIs. While DI is primarily used with the W3C VCDM 2.0, other JSON-LD data models might be secured by DI in the future, such as ZCaps, so we cannot just rely on mitigations defined in the W3C VCDM 2.0. For this reason, I believe this mechanism for protecting the integrity of the @context definitions is actually the responsibility of DI itself, since those @context definitions are part of the canonicalization and signature creation/verification. It would mitigate DISW attacks by making @context definition integrity checking part of the signature verification process. In that case, a similar mechanism to relatedResource has to be defined in the DI specification, and making it mandatory would help verifiers and issuers avoid skipping certain checks when issuing and verifying DIs.

@awoie
Copy link

awoie commented Jun 24, 2024

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples:

Example: VC using a base data model for all driving licenses

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
  ],
  "type": [ "VerifiableCredential", "DrivingLicense" ]
  "credentialSubject": {
    "canDrive": true
   }
}

Example: VC issued by DMV of Foo

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://foo.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "FooLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "foo": true
   }
}

Example: VC issued by DMV of Bar

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://bar.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "BarLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "bar": true
   }
}

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar. Verifiers in the realm of Foo are primarily interested in the base properties of the DrivingLicense and occasionally in the specific properties of the FooLicense. The same situation applies to the realm of Bar, but with a focus on their respective properties.

Adopting the ACCEPTED_CONTEXTS approach would require Foo, Bar, and all other realms to continually distribute and update their individual context definitions. This approach just does not scale very well and it sacrifices the open-world data model since all @contex URLs and/or definitions have to be statically configured.

@veikkoeeva
Copy link

veikkoeeva commented Jun 24, 2024

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples:
[...]

Great examples! Thanks!

Some context on why I think why a test "should not happen" plus a less mentioned issue of having good examples.

Related to #272 (comment): I'm not completely alien to this sort of work and indeed, when I implemented the "first pass sketch" of the code, I bit struggled with implications of this sort since I'm not so familiar with JSON-LD. So, I thought to "get back to with better time" and just not release anything before things are clearer (plus the library change not being public, though there's something already in the tests about this).

Some part of that was if I have a document like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6, how to pick apart the pieces for canonicalization, turning into bytes, hashing, signing and so on. For this "sketch" I was quite happy to have the same results with the keys as the example document, but I know I paid only passing thought for these sort of things. Partially because there's been related discussion earlier.

I mention this about this example piece, since I think since good examples are perhaps more important than has been implied here. I naturally also think that a test of what not should happen are important -- and maybe add some notes of the sort to an example or two too. They're already something I've (we) been codifying to some tests. It's also a great way to document things.

@filip26
Copy link

filip26 commented Jun 24, 2024

@awoie a verifier should not guess what's inside a context nor to try to anticipate if there is some agreement between context providers.

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar.

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

An ability to understand well know terms, e.g. defined by schema.org is a great feature but not in VCs eco-system where we don't want to guess but be sure.

This approach just does not scale very well and it sacrifices the open-world data model since all @context

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

If there is a need, a good reason, to share parts between some parties, then the easiest, transparent, and scalable solution is this:

 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://foo.com/ns/dl/v1"
  ],
 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://bar.com/ns/dl/v1"
  ],

@awoie
Copy link

awoie commented Jun 24, 2024

@filip26 wrote:

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

I didn't say it is not a solution. My point was that it is a solution which does not scale. A verifier from Foo might have never seen a @context from Bar but it shouldn't matter because they agreed on a common vocab defined by https://www.w3id.org/dl/v1. Forcing all verifiers or issuers from different realms to continuously reach out to each other to keep @context URLs and definitions up-to-date and well-known does not scale for a lot of use cases.

@filip26 wrote:

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

@filip26
Copy link

filip26 commented Jun 24, 2024

@awoie

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know - that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issuers, verifiers, etc.

@awoie
Copy link

awoie commented Jun 24, 2024

@filip26 wrote:

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know -

My point on scalability refers to an increase in operational costs, not necessarily performance. Performance might be another point but I cannot comment on that.

@filip26 wrote:

that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issues, verifiers, etc.

If this is what we want, this sacrifices the open-world data model the VCDM promise as mentioned here.

@filip26
Copy link

filip26 commented Jun 24, 2024

@awoie I'm sorry, I don't think we are on the same page and I'll let others to explain that it does not affect scalability of VCs eco-system nor open-world data model.

@PatStLouis
Copy link
Contributor

@awoie
I like you extensibility example a lot since its similar to the context in which I'm evaluating the impact of this.

My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system?
The base information contained in the dl context is by design sufficient for verification needs by third parties.

Verifiers will know what information they want to verify, they are not blindly verifying abstract data.

As for the classification of this disclosure, while I can't really argue with your labeling, this is not a formal classification.

If we take 2 examples of vulnerability disclosed around XML Signature Wrapping Attacks:

Both of these affect a specific software and lead to 2 distinct CWE:

  • Improper Verification of Cryptographic Signature
  • Incorrect Authorization

They are not addressed by a change to XML, but a security mitigation in the affected software. This is an important distinction to make and loops back to a Security Misconfiguration.

It's hard for me to understand what exactly this disclosure tries to underline as the vulnerability

  • Json-ld?
  • Verifiable Credentials?
  • Data integrity?
  • A specific cryptosuite?
  • A specific implementation of a cryptosuite?

In seems the target of the vulnerability is being shifted around depending on the questions asked/comments made.

@tplooker
Copy link
Author

tplooker commented Jul 8, 2024

@decentralgabe wrote:

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

Just want to clearly express my support for this one, I believe this is the most important of the proposals to implement which actually mitigates the security vulnerabilities raised by this issue.

@msporny wrote:

Strong -1, for all the reasons documented in this thread (it, debatably, doesn't properly address the issue, it makes certain use cases some of us are counting on impossible to achieve, it creates privacy harms, we'd be doing it at the last minute without much implementation feedback, etc.)

Perhaps I'm missing something but I've not seen evidence of this not addressing the issue, with regard to usecases if you mean the ability to re compact a document against a new context, then this can be done when the @context is integrity protected, it just happens post signature validation, which is a much safer time to do it, rather then blending it into the signature validation procedure. I also don't see what possible privacy harms come of this.

@TallTed
Copy link
Member

TallTed commented Jul 10, 2024

All —

Please review your comments above, and edit them to codefence all instances of @keyword, including those in content within your comment which was quoted from someone else.

All you need to do is wrap those @keyword instances in single backticks, like `@keyword`.

Failure to do this has looped several GitHub users (e.g., @context, @vocab, @protected, @base, all of whom were tagged in the initial comment that opened this issue; I haven't reviewed the whole thread to make a list of all such) into this conversation without their permission nor request.

This is often shrugged off, as, what's the big deal about a few irrelevant notifications? Well, in this thread, it's more like 133 (and counting) irrelevant notifications. In most spheres where I travel, that would be classed as spam and/or harassment.

Please, let's be good gitizens.

@iherman
Copy link
Member

iherman commented Jul 10, 2024

The issue was discussed in a meeting on 2024-07-10

  • no resolutions were taken
View the transcript

2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)

See github issue vc-data-integrity#272.

Brent Zundel: See Gabe's comment within the thread.

Brent Zundel: concerns about security implications of data integrity signatures.
… extensive conversation on the issue.
… Original poster of issue as signaled acceptance of the proposal to address the issue.

Phillip Long: +1 for Gabe to walk us through it.

Brent Zundel: pending a P.R to address the issue.
… gabe could you walk us though the proposal.

Gabe Cohen: spent a long time trying to understand this issue.
… First proposal is adjustment to @vocab. This seems uncontroversial. P.R already open.
… Second thing is we need some text about @context validation, we need to be very clear about how to handle untrusted contexts.
… There is some discussion around a proposal to add signatures to context to make them tamper evident.

Dave Longley: +1 to changes to @vocab, +1 to some clarifying text around validation requirements, +1 to some tests if we can make them make sense -- validation is application-level and maybe really for a "best practices doc" but fine if we can make it work.

Dave Longley: -1 to locking contexts for all the reasons listed in the issue already.

Gabe Cohen: The outcome of the proposal would be 2 or 3 issues to address these points and associated PRs.

Manu Sporny: Agree that 0,1,3,4 parts of the proposal are good ideas.

Dave Longley: +1 to Manu.

Manu Sporny: 0 is easy, 1 will take some work.
… need to be careful not to pass into application space.

Anil John: +1 to 0,1,3,4 ... Have a question regarding 2. Will put myself on the queue to ask.

Dave Longley: +1 to Manu, important not to go into the application layer.

Manu Sporny: will come back to 2, that one is controversial.
… 3 is a variation of 1 should be fin.
… For test suites, we can improve them to check this. Need to tell implementers to check for bad contexts.

Dave Longley: +1 that we're telling implementers to essentially create a special application-level rule, but it could work.

Manu Sporny: Telling issuers and verifiers to implement business logic to pass the test suites.
… +1 to almost all of the things.
… we should not mix context integrity with signatures.
… it limits some use cases the people are relying on.
… Forces disclosure of all of the contexts.
… Forced to leak data if mix cryptographic hash with the context.
… There are other ways to prevent this attack, would be a strong -1 for mandated context integrity protection.

Dave Longley: +1 to not locking contexts for all those reasons and because it doesn't fix the problem :).

Brent Zundel: For number 2, the integrity of contexts. Is it sufficient to recommend people to use the related resource property.
… use related resource for context integrity for people who want it.

Joe Andrieu: When talking about context integrity, we are talking about some hash. Not that the context is signed over.

Gabe Cohen: for 2 I intended to mean a signature, for 1 I mean a hash.

Manu Sporny: I think they are the same. We aren't talking about signing over the property and the value. We are talking about including the hash of every context in the @context property.

Joe Andrieu: We do not currently sign over the @context property.

Anil John: Is option 2 not a secure meta data distribution problem.
… What is the problem we are solving here.
… Seems to be multiple options regarding trust registries, having pointers from DID documents.

Dave Longley: Let someone else speak to aniltj.
… Believe option 2 does not solve the problem raised in that issue. That issue is about a validation problem.
… where terms are read from a context that is not known.

Anil John: Is not (2) a secure metadata (@context files being such a thing) distribution problem?

Dave Longley: even if contexts never changed and were integrity protected, you can still make these mistakes if you don't know the context.

Gabe Cohen: responding to aniltj, yes secure meta data is part of it.
… folks advocating for 2 are concerned that the issuer signed over the context and its properties so they remain untampered for any verifier.
… building on brent, wondering if we could have a new data integrity suite to includes signing over context.
… still a discussion to be had about whether it addresses the concerns.

David Chadwick: commenting on the idea of using the related resource property, that is signed over. Noting to stop issuer adding relatedResource property to any URI.

Dave Longley: +1 David is correct.

Manu Sporny: DavidC is correct.

David Chadwick: Cannot stop an issuer from using this.

Brent Zundel: question is do we want to encourage people to use this as a means of addressing this issue.

Dave Longley: -1 it does not address the issue for the reasons I stated, this is a validation issue, not a security issue.

Greg Bernstein: part of this issue from a security point of view comes down to understanding who controls what inputs.
… When we talk about this context value, we dont secure the value of the context, we secure all the quads.
… do we want to secure the value of the context?

Manu Sporny: yes, +1 to Greg -- it's the verifier that controls the input wrt. @context.

Dave Longley: +1 to GregB, with the rdfc cryptosuites, the underlying information is secured, not the specific expression, allowing alternative expressions.

Greg Bernstein: or do we accept that the context is under the control of the verifier. E.g. the verifier idicates the context values they accepts.
… Verifier says they wont take a context that they dont understand.
… Don't protect the context value, because that isn't part of the inputs that need to be protected. The verifier controls the context.

Anil John: +1 to Greg regarding the Verifier doing due diligence on the @context's @did's it finds acceptable. In fact, that is what we are doing in practice.

Manu Sporny: brent said could we just use relatedResource. We could, but how normative would we want to get with that.
… is it mandated that if you have a relatedResource with a context, the verifier must through an error if the hashes dont match.
… We would end up getting to a lot of the same issues.
… GregB is right, the verifier determines the contexts and the hashes of contexts that they accept.
… security model is different from the way jose cose do things.
… confusion in this thread. JCS does sign over the context values.
… but that in itself is not enough, verifier has to choose the contexts they accept. This is a validation layer. An application issue.
… if we try to solve this at the crypto layer, you make use cases not possible and it doesn't solve the problem.

Dave Longley: you can't know how to interpret JSON from just reading its key names (a JSON key of "firstName" may as well say "asdf").

Manu Sporny: Things to consider. This can be viewed as a meta data distribution problem.
… We tell people do not ever load contexts over the network.
… once a spec is published, verifiers and issuers can lock to very specific contexts if they want to do that. This is expected.

Dave Longley: you need to know context to properly interpret JSON (whether it's explicitly given like in JSON-LD via @context or you get it out of band somehow).

Manu Sporny: never have to go to the internet. You should not be doing that. Should have a local copy.
… When you get data in, you should check every context and know that you understand these.
… This addresses the attack.

Brent Zundel: Need to start transition to what the next steps are.
… if relatedResource as a property doesn't fully address this, we did come up with a mechanism for providing tamper evident information. A VC.
… What if you are concerned about this, issuers can provide a VC of the context file.
… Next, is concern about the context, the same as concern about the public key. Is this key related to this issuer.

Joe Andrieu: conflating issues between the integrity of the property. Not convinced data integrity does not secure the value of the context.
… If I can manipulate the context and still have the VC validate that feels like a huge problem with data integrity.

Gabe Cohen: Joe - please review the presentation which has examples to prove that if you manipulate the context verification can still succeed https://docs.google.com/presentation/d/1MxLMIjubCVykDux8fBWcisXLu9WeE0LxZzU_xlANUMc/edit.

Dmitri Zagidulin: and integrity of the file is addressed by the digest mechanism.

Anil John: not a tech provider or tool user. I am an user of this technology.
… We rely on the recommendations of this group.
… As a verifier, what we are doing is manually verifying and reviewing the context to ensure it is acceptable.
… Ensure it is coming from an entity whose attestations we want to use in our business processes.
… Just signing the context itself, doesn't solve this problem for me. Just gives confidence that the context has not changes and is coming from an specific entity.

Dave Longley: +1 a signature on a context tells you nothing about what it is/what it means, context must be processed (at runtime or understood before that and coded against).

Anil John: Struggling to reconcile what is important and relevant to what we are trying to do.

Greg Bernstein: The VC-DI-EdDSA and VC-DI-ECDSA specs allow two different canonicalization approaches. One (JCS) signs over the "JSON", the other (RDFC) signs over "quads" which include complete "term definitions" (URIs) expanded from context.

Ted Thibodeau Jr.: First, data integrity signs over quads which is the result of applying the context to the json being signed over.

Manu Sporny: yes, exactly, TallTed is exactly right.

Ivan Herman: +1 to ted.

Benjamin Young: +1 good summary, Ted.

Joe Andrieu: thanks, Ted.

Dmitri Zagidulin: @decentralgabe - I missed the previous discussion of -- doesn't relatedResources securing the context address this concern?

Ted Thibodeau Jr.: This is open world, VCs can go anywhere at any time, I've never verified a VC from issuer X... I need to get context file that's relevant for their VCs -- this is not the /default/ context, this is the Verifier X context, in addition -- I have to retrieve it and make use of it and have assurance that context file that I retrieve is the same as the context file that was used when generating the VC.

Dave Longley: to respond to TallTed, I think it is the question around secure meta data delivery.
… Able to get some context from some source and have confidence it is the context you intended.
… on the queue to discuss whether to context value is secured.
… the original string of the context is not secured. The context property is fully processed to generate the nquads from jsonld.

Joe Andrieu: so the property is secured. It probably shouldn't be dropped.

Dave Longley: These quads are secured.
… If you were to manipulate the context in any meaningful way the sig would fail.

Manu Sporny: happy to raise PRs for 0,1,3,4. Everything except context integrity protection.
… Think I have enough from the group for this.
… will take a week or two.
… On the queue to respond to brent, we provide an algorithm for how to retrieve the public key but someone could still say you didn't sign over the public key.
… This would be the same security disclosure we are discussing here.
… You have to as a verifier vet certain things. The context. The public key. Other types of meta data.
… This is a good analogy.

Ivan Herman: related to manu around the PRs to come.
… manu said, you have to publish the hash of the contexts they create.
… Are we sure we really make it clear that authors of new contexts have to publicly disclose the hash.
… if this isn't there then it should be.

Gabe Cohen: question about the use of JCS.
… This could be a way out if JCS is signing over the context value.
… How is this communicated.

Greg Bernstein: Its in the cryptosuite name...

Gabe Cohen: What about nested contexts. Contexts that reference other contexts. How do we secure those.
… For 2 it may be worth continuing this in a new issue.
… Some people not present who might still need convincing.

Ted Thibodeau Jr.: it seems that along with disclosure of context hashes, the context hash should (maybe must) be included in the VC, somehow.

Dmitri Zagidulin: Main question is doesn't the relatedResource mechanism address a lot of these concerns.

Greg Bernstein: See for example: ecdsa-jcs-2019 and ecdsa-rdfc-2019 in the VC-DI-ECDSA spec.

Brent Zundel: relatedResource exists inside of the data model. Would work only when data integrity is signing VC data models that include related resource.

Gabe Cohen: thanks, makes sense.

Manu Sporny: ivan asked if we should add language around providing hashes of finalized contexts.
… Think it is fine to say they should. Concerns around using MUST.
… There are other reasons why you may not need to publish a has.
… schema.org is well known. They don't need to publish hashes for there contexts.
… I might as a verifier, retrieve something and lock to a specific hash that I retrieved.

Joe Andrieu: schema.org is a canonical example of semantic drift.

Manu Sporny: We need to think about the use cases deeply before saying MUST around providing context hashes.
… The text we have in the spec is enough to address the attack raised in the issue.

Dave Longley: note: if a context changes in a way that de-syncs it from the originally signed information, the signature will fail -- because the original context contents are "baked into" the originally signed information.

Manu Sporny: A known bad context was included in the verifier as a good context. The end.
… it was misconfigured software.
… We already have that language in there.

Dave Longley: so if a dynamic context is used -- signatures on existing VCs will start to fail -- and that would need to be ok with people taking that approach.

Manu Sporny: We can clean it up and clarify.
… many of these solutions are problematic. The relatedResource and including the hashes in the signature is not solving the problem.

Dave Longley: fundamentally: you can't read a term defined by a context without understanding that context.

Joe Andrieu: I don't agree, I think today people defining contexts can create a URL for a context that has a hash in it.
… We don't have to change the VCDM to support that.

Phillip Long: +1 to Joe's use hashes in @context files if you want them secured.

Brent Zundel: we have a path forward. Thanks for that. look forward to seeing those PRs.

Dave Longley: +1 to Joe's suggestion as a layered way to do things people may want.


@dmitrizagidulin
Copy link
Contributor

dmitrizagidulin commented Jul 10, 2024

As an implementer of wallets, verifiers and issuers, and both DI and JWS libraries, I want to add my thoughts to this crucial discussion.

  1. @contexts SHOULD be digest hash protected, for both DI and JWS protected VCs. (I understand there may be some exotic use cases where the issuer might not want to sign over the context URLs and hashes, but I've also experienced first hand the needless confusion and wasted effort that results in NOT locking down contexts with hashes.) But I think those of us who write tutorials and implementation guides should STRONGLY recommend that the relatedResources + digest mechanism be used in VC design.
    • Incidentally, digest hash protection of any resource, including @contexts, should definitely be a general purpose mechanism (not even a Data Integrity specific one, just a general JSON mechanism). But I understand the practicalities of WG deadlines; fitting in to the VC DM is better than nothing (even though it would be even better to have it in a standalone mini-spec).
    • And yes, the relatedResource mechanism does not protect contexts that are included in other contexts (like, for example, the v1 examples context, https://www.w3.org/2018/credentials/examples/v1 -- it included the ODRL context inside it). And so again, tutorials and VC implementation guides should STRONGLY recommend not embedding contexts like that.
    • Btw, the digestMultibase mechanism is strictly better than digestSRI, because it has two critical features: 1) it allows optional JCS canonicalization of JSON resources, whereas digestSRI does not. and 2) It can be base64url encoded, and thus included in URLs, whereas digestSRI is locked to (non-URL) base64 encoding.
  2. The core VC DM 2.0 context should still be marked as @protected: true, BUT the @vocab training wheels should be moved to the https://www.w3.org/ns/credentials/examples/v2 context (or rather, it's already there in the example context, so it should be removed from core VC DM 2.0 context).
    • Asking developers to use https://www.w3.org/ns/credentials/examples/v2 (and its vocab) strikes the right balance of newbie-friendly, but also serves as a reminder that when it comes time for prod, the examples context should be removed (and undefined terms dealt with).

@iherman
Copy link
Member

iherman commented Jul 11, 2024

Incidentally, digest hash protection of any resource, including @contexts, should definitely be a general purpose mechanism (not even a Data Integrity specific one, just a general JSON mechanism). But I understand the practicalities of WG deadlines; fitting in to the VC DM is better than nothing (even though it would be even better to have it in a standalone mini-spec).

FWIW, the issue of adding SRI to a @context was on the agenda of the JSON-LD Working Group, but has been postponed for a future release, see w3c/json-ld-syntax#108. I do not know what the plans are on actively reopening this issue in the JSON-LD WG (which is in maintenance mode now), but a future version of VC may well rely on a more general mechanism of a new version of JSON-LD. Which suggests that we should not define a new feature for VC but rely on, for now, the usage of relatedResource.

Cc @BigBlueHat

@tplooker
Copy link
Author

@contexts SHOULD be digest hash protected, for both DI and JWS protected VCs.

@dmitrizagidulin could you clarify what you mean by "hash protected" here and how this protection would work? For example do you mean protected by the issuers signature like what has been suggested by @decentralgabe, myself and others? Or instead leaving the protection reliant on an out of band step similar to the current specification guidance and hence optional in practise when it comes to validating a DI proof.

@dmitrizagidulin
Copy link
Contributor

dmitrizagidulin commented Jul 11, 2024

@tplooker

@contexts SHOULD be digest hash protected, for both DI and JWS protected VCs.

@dmitrizagidulin could you clarify what you mean by "hash protected" here and how this protection would work? For example do you mean protected by the issuers signature like what has been suggested by @decentralgabe, myself and others? Or instead leaving the protection reliant on an out of band step similar to the current specification guidance and hence optional in practise when it comes to validating a DI proof.

At present, I think we should go with (and recommend to devs) the 'digest hashes signed over by issuer' like you and Gabe suggested. Specifically, the relatedResources + digest mechanism that's in the current VC DM 2.0, since those properties do get signed over (by both Data Integrity and JWS mechanisms).

Would I prefer adding the digest hashes to the URLs, inline? (So that links would look like this:

{
  "@context": [ "https://www.w3.org/ns/credentials/v2#digest=12345"],
  // ...
  "evidence": [{
    "id": "https://example.com/files/my-dissertation.pdf#digest=4567"
  }]
}

Yes, I'd prefer that mechanism, in the ideal world (if we had time before CR to publish it as a standalone spec, AND if there was political will in the community to add support for this to JSON-LD libraries (or at least on the HTTP client level that those libraries use) in various programming languages). It's a somewhat better mechanism because a) It could be used in embedded contexts (where you add a link to another context inside a context, like examples/v1 did). and b) It's a useful general purpose JSON mechanism.

BUT, given the current landscape, it's more realistic to use relatedResource & sign over it as usual, so that's fine. (and by "that's fine" I mean -- doing that is STILL drastically better than not using any sort of digest).

@dmitrizagidulin
Copy link
Contributor

FWIW, the issue of adding SRI to a @context was on the agenda of the JSON-LD Working Group, but has been postponed for a future release, see w3c/json-ld-syntax#108. I do not know what the plans are on actively reopening this issue in the JSON-LD WG (which is in maintenance mode now), but a future version of VC may well rely on a more general mechanism of a new version of JSON-LD. Which suggests that we should not define a new feature for VC but rely on, for now, the usage of relatedResource.

@iherman Oh, that's really cool! Thanks for the link. I really hope the future json-ld WG does get around to standardizing that mechanism!

@kimdhamilton
Copy link
Contributor

I think I like what Dmitri's suggesting, but to clarify:

  • Yes to the term "digest hash protected" specifically
  • Steer away from anything too prescriptive about how that happens

Having to use relatedResource would be redundant for someone achieving this another way: content addressable storage, DID Linked resources, etc, and would cause unnecessary cruft.

@msporny
Copy link
Member

msporny commented Jul 16, 2024

Noting actions to raise PRs and issues taken on the last telecon.

  1. Adjustments to @vocab

PRs have been raised for this item: w3c/vc-data-model#1520, w3c/vc-data-model#1524, and w3c/vc-data-model#1524

  1. Enhance Context Validation (aligns with 4; this may already be present?)

Now being tracked in w3c/vc-data-model#1529

A PR has been raised for this item: w3c/vc-data-model#1535

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

Two PRs have been raised for this item: w3c/vc-data-model#1537 w3c/vc-data-model#1567

  1. Clarify the Processing Model

Now being tracked in #288

A PR has been raised for this item: #291

  1. Improve Test Suites

Now being tracked in w3c/vc-data-model-2.0-test-suite#88 and w3c/vc-data-model-2.0-test-suite#88 with the expectation of PRs for discussion within a few weeks.

@David-Chadwick
Copy link

David-Chadwick commented Jul 21, 2024

@dlongley The bit about display which you picked up on is peripheral to the main text. We can easily omit it so that the proposed text becomes

"DI signature verifiers MUST trust all the @context definitions in the verifiable credential. Globally there are expected to be thousands of @context definitions, and no implementation can feasibly trust all of these definitions. A few definitions will be widely trusted, such as the one published in the VC DM, or in another ISO or IETF standard or by a national or international authority. The remainder are likely to be only be trusted by a small subset of implementations. Consequently all implementations MUST discard all the non-trusted @context definitions before processing the verifiable credential"

@iherman
Copy link
Member

iherman commented Aug 4, 2024

The issue was discussed in a meeting on 2024-07-31

  • no resolutions were taken
View the transcript

2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)

See github issue vc-data-integrity#272.

Brent Zundel: Data Integrity: we have one aspect of one key issue left.
… an issue raised citing possible vulnerabilities, a solution has been presented, all aspects except one of that solution have been agreed upon.
… the folks that raised the concerns believe that it would be most appropriate to add a mechanism for demonstrating context integrity.
… other folks are concerned about the overhead of that and that it may not solve the problems.
… that is the essence of what we want to talk about right now - how would folks feel about this? what should we do/not do?

Manu Sporny: #272 (comment).

Manu Sporny: 272 has 143 comments now, been discussed extensively, Gabe made a proposal many weeks ago that had broad support for a number of the items, can follow your nose to what Gabe suggested and how those things went into PRs.
… made adjustments to @vocab mechanism, took it out of core context, put in warnings about using it and how it might work in your benefit or against you.
… proposal 0 that Gabe did, fully implemented.
… the other thing that we ended up implementing was enhancing context validation.
… didn't spell out how to do it before.
… arguably the main issue with the security issue raise, that they weren't checking their context values.
… we now have a normative context validation algorithm in the Data Integrity specification.
… eventually that algorithm should be pushed to the JSON-LD spec.
… and the VCDM says that you should understand context values before doing any business logic.
… we can write tests for this algorithm in the test suite to test against implementations.
… that was the second proposal, the other thing was clarifying the processing model (what's the order of steps etc.), that is in as a PR.
… the only thing that we now need to focus on is: do we need to put mandatory context integrity protection into the digital signature in Data Integrity.
… I don't think it addresses the issue at hand, the problem with the security disclosure is that an important check was not being done.
… we have a mechanism to do context integrity protection w/ related resource.
… tying this any deeper into the crypto layer is problematic as it prevents certain use cases.
… not possible to go to CBOR-LD in certain cases, not possible for entities receiving a message to use their own context.
… in RDFC use case it doesn't make sense b.c. you are already signing the quads.
… there is an argument in JCS case, but could use resource integrity there.

Dave Longley: +1 to allowing issuers (if they desire) to use related resource, +1 to never forcing a particular context on holders and verifiers, they should be free to translate/recompact. +1 that the forcing wouldn't fix the issue anyway.

Manu Sporny: all doing this tells you is that you are using the same content hash that the issuer used when they created it, this is insufficient to protect against the kind of attack in 272.
… what's being proposed 1. doesn't address the core issue 2. prevents important use cases 3. is redundant w.r.t an existing feature.

Dave Longley: -1 to a recommendation that would preference it.

Brent Zundel: would it be possible to add a recommendation that folks do resource integrity? would that be a means of satisfying the folks who have raised this issue?
… curious what the drawbacks of that would be.
… it's something that we have defined, is straightforward, and maybe would do no harm.

Dave Longley: +1 to DavidC.

David Chadwick: my suggestion would be weaker, just to add a note saying that if the issuer wishes they can integrity protect the context with related resource.

Manu Sporny: +1 to DavidC's proposal.

Gabe Cohen: +1 to DavidC's proposal.

Brent Zundel: would anyone be opposed to adding this note? anyone want to argue that the context integrity stuff absolutely must happen?
… we have a path forward, do we have a volunteer to raise this PR.

Manu Sporny: I can raise this PR.

Brent Zundel: we raise the PR, we go into the issue and list the PRs that have been raised, we as a working group believe this addresses the issue, then close the issue - is that right?

Manu Sporny: sounds good, one thing we can always do is strengthen requirements, if you are working in an ecosystem that needs this, in their spec they can mandate that verifiers must check this, at the base layer we are saying that you can use this feature, but in another spec using this you can make it a must.

Dave Longley: +1 to profiles being allowed that can require this or anything else (not encouraging anyone to profile in any particular way though).

Manu Sporny: it'.
… it's mostly at application layer, in a particular ecosystem verifiers must check this etc.

@iherman
Copy link
Member

iherman commented Sep 6, 2024

The issue was discussed in a meeting on 2024-09-05

  • no resolutions were taken
View the transcript

1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)

See github issue vc-data-integrity#272.

Brent Zundel: Welcome everyone to the special topic call - topic today is Data Integrity issue 272.
… Concerns about security vulnerabilities in design of data integrity, well discussed issue, lots of good proposals and solutions, to my understanding, everything was agreed on (adjustments to vocabulary, enhancements to context validation, clarifying data model, lots of changes made as response to this issue).
… There is a particular ask that will likely be the primary topic of conversation today, but turning over to Tobias to present the issue. Tobias, the time is yours to present and then we'll go to discussion.

Tobias Looker: Thanks for holding the special topic call. I won't spend too much time going back through the issue... 144 comments, I think we're all well versed into conversation that has gone in attempt to resolve the issue.
… To summarize, from my point of view, there has been a lot of conversation and proposals raised, the biggest one that remains to ensure this cannot occur, is around integrity protection around @context issues. I've tried to highlight that I do believe that alone is the most important proposal to discuss in more detail.
… Even with adjustments to @vocab, and non-normative guidance, leaving the context values open to manipulation is still a significant issue in my opinion. While I appreciate text around relatedResource feature, I'm not convinced it goes large enough. It's largely an optional feature. I could not see test coverage for that feature. I guess I'm wary of the likelihood that implementations would support such an optional feature. Protection wouldn't exist in most implementations.
… I think we should go stronger, require protection of @context values. I know there were discussions around trade-offs, or doing context transformations. I don't want to limit context transformations. I debate whether or not they're useful in data signing and verification. I think it's useful AFTER things have been verified.
… Verification should be done on data used by issuer.

Brent Zundel: I'm happy to presume that everyone is up to speed on this discussion.
… The proposal on the table is to strengthen the protection on @context in Data Integrity, let's have a discussion.

Dmitri Zagidulin: I'm curious if @tplooker would be ok with changing the MAY to a SHOULD. or if MUST is required.

Benjamin Young: > To extend integrity protection to a related resource, an issuer of a verifiable credential MAY include the relatedResource property:.

Benjamin Young: https://w3c.github.io/vc-data-model/#integrity-of-related-resources:~:text=To%20extend%20integrity%20protection%20to%20a%20related%20resource%2C%20an%20issuer%20of%20a%20verifiable%20credential%20MAY%20include%20the%20relatedResource%20property%3A.

Tobias Looker: SHOULD is better but I believe MUST is required.

Dave Longley: I think SHOULD is strong enough, we have a number of use cases and things we've discussed on that issue. Reasons to allow different contexts to be used that are different from issuer wants to use. IWe've talked about design around flexibility on those cases. Individual uses on the spec in different areas, different verifiers, can profile the spec and require related resource to be used if it suits that ecosystem. At base layer of spec, we should have ability of other uses to work.

Brent Zundel: An assertion was made by Tobias that, specifically, the set of use cases that would require additional context switching would still be possible and should be done after issuer context as provided has been part of verification process. Can you talk about those lines?
… What use cases would be prohibited on that model. What would be drawbacks of requiring that verifier checks before doing context manipulation.

Dave Longley: This doesn't actually solve the problem, the recipient has to understand what context is, even if it is from a context that is trusted, terms definitions, doesn't fundamentally solve the problem.
… There are use cases where selective disclosure, omit context issuers use -- where a verifier might accept international license, but license might have issued by different countries and you just want to selectively disclose only international properties and international contexts.
… Take a VC and express as CBOR-LD and transmit to a verifier that can accept it, doing that might require context from issuer used. You should not have to go back to issuer to ask for their permission to request format for same information. Key to data integrity design, where holders can be independent in what they do. They don't have to keep going back to issuer to request different formats. None of those things should be happening. Privacy and liberty considerations. Fundamentally, you can't do all of those thing is if you're forced to use issuer-provided context.

Tobias Looker: What I was trying to say is -- from a selective disclosure perspective, it still works, but some of the context values don't expand to terms. I don't think SD is prohibited. The same with CBOR-LD. One of the overarching points is issuers fundamentally comfortable w/ data transformations after data is signed that might misconstrue or mis-convey information that was originally signed. Most document issuers of major credentials don't want that level of manipulation to occur. They don't want holders to manipulate credentials to misconstrue credential, come to rely on information. It's issuers reputation on the line at end of day, just like w/ physical documents today, we don't let holders allow changes. It's issuers reputation on the line, they should have a say on how much document can be modified after issuance. Leaving document to be manipulated leads to issues. Tradeoff is difficult for issuers to be able to adopt the technology if it is allowed to persist.

Dave Longley: That we're having a philosophical debate on extend of three party model and who gets to have control indicates that this is what profiles are for. Spec has flexibility, that is good, I don't think we should be too prescriptive. We should allow separate ecosytems, let the ecosystems have the choices they'd like to make.

Anil John: Speaking on behalf of a high value credential issuer, resonate with some of the things Tobias noted, issuers have a perspective on credentials that we issue. I also think that there has to be a balance. We as an issuer are interested in providing ability to verify information that we provide and also allow credentials to be used via CBOR-LD, rendering mechanisms, and the like. We as an issuer won't be silent on it, spec as it is written, if we feel strongly about certain aspects of it, we can profile the standard itself to make sure other credentials we issue follow a path that conforms to our belief.
… The question I ask in general, if we as an issuer wanted to mandate the requirement of using the feature in order to sign the context itself, that's simply a matter of us articulating that in our profile, are we prohibited form doing that?

Dave Longley: +1 you can say you must use related resource.

Brent Zundel: Short answer is no, you're not prohibited from doing that.

Tobias Looker: That is a good point, the text today is optional.
… even if you do that, relying parties can ignore it.

Dmitri Zagidulin: I thought the relatedResources mechanism was required, not optional?

Dmitri Zagidulin: @BigBlueHat - in the sense that, IF the relatedResources property is present, THEN it must be validated.

Dmitri Zagidulin: (and if that's not the case, my proposal is to constraint relatedResources that way.

Tobias Looker: If I'm an issuer, and VCs today under this assumption, these @context values are left not integrity protected, I have to trust that relying parties trust document loader, how they've held context in storage, how they've allowed those not to be manipulated, if any of that fails, if this is not mandatory, all bets are off. That is what is difficult to square here. Document issuers are going to look at that, and huge amount of new trust in RP implementations that my documents are being verified suitably.

Manu Sporny: to respond to one comment.
… the comment about things can be ignored.
… the spec today can be profiled and be made more strict--that is true.
… you can in a profile state those things as MUSTs written into a profile.
… I'm pushing back on the notion that it's optional and people can ignore it.
… people always need to trust the issuer and the verifier.
… in this model there's a certain amount of trust in the verifier that they will do a good job.
… and the verifier has an obligation to do the correct amount of verification to get the result they require.
… not doing that, means the verifier is to blame for any fraud.
… they are required by the market to follow the expectations of the issuer.
… it's not purely the issuer's reputation at stake.
… the issuer can point to the verifier and show where the verifier failed--just as they do today.
… if you accept a fake Driver's License, that's on you.
… the issue here is that we provide mechanisms that can content integrity protect the issuer if the issuer really strongly requires that.
… any vertical can profile it to mandate that.
… we already have profile examples that are similar.
… if a verifier chooses to ignore that profile, there are plenty of negative results.
… we don't have to mandate it at the VCDM layer.
… but can in a profile.

Tobias Looker: +1 brent that was the point I was trying to make.

Tobias Looker: Sorry dmitriz :).

Dmitri Zagidulin: The conversation on the call so far, Anil mentioned that while requiring context verification might be too far, he'd like to see ability for issuer to say "verifiers MUST do this". At the moment, relatedResource - we don't have normative language if related resource is present, they must be enforced, if they fail, the VC verification fails. Adding the normative language, making relatedResource normative and ensuring issuers enforce context integrity.

Brent Zundel: I got on the queue to suggest something along similar lines. We have two conformance classes, conforming producer and conforming consumer. If I'm understanding use cases property, we do not want to prohibit the producer to do things that require context integrity protection, what if we do what Dmitri just said -- it is a MAY to include context integrity protection using relatedResoruce, but if that is used to protect context. Conforming consumer must validate that context integrity protection. Is that crazy? Does that work?

Anil John: I am curious to hear answer to your question. I wanted to correct one thing that Dmitri said -- verification and issuance side, my point is that I, as an issuer, can control my issuance infrastructure. Profiling allows me to say whether or not to use relatedResource to sign context information or not. If I'm acting as a verifier, this is where I live in the real world, at end of the road, just because I'm issuing a high-value credential, significant realtime identity verification and usage, doesn't mean that credential isn't being used as a flash pass.

Dave Longley: +1 to anil ... issuers do not control what verifiers do ... so we need to avoid an illusion of control here.

Anil John: Ultimately, verifiers don't have to follow our guidance. They might have a different risk tolerance. We issue multifactor smartcards, 2nd factor pins, but there are many cases where people look at photo and verify as valid.
… The belief that I, as an issuer suffers when a verifier doesn't do a proper verification, is false. We point the finger at the verifier as the problem.
… There is no guarantee that someone will follow our guidance other than us. I'd like to hear reaction to brent's proposal.

Benjamin Young: +1 to aniltj's framing.

Dave Longley: -1 the effective is not the same at all.

Michael Jones: What puzzles me is that we're talking about this as if it's a garden variety optional feature that people can do it or not, given that it's a security feature, and it will be abused ... should we even give people that option.
… Given it's a security feature, we should say it's mandatory on both sides and if you don't do it, you fail the conformance tests.

Tobias Looker: A similar point, we have to see that this is not an integrity protected portion of the VC, at best, not like unprotected headers, effects entire part of processing of VC. It effects everything that's not integrity protected.
… reputation issue to verifiers, do agree with Manu -- more plausible deniability you create for verifier "that's not clear, that's not optional, guidance wasn't clear" the industry around fake DLs is evidence of that, document security features of DLs are too difficult to verify. RPs get away with poorly validating those documents because it wasn't reasonable, too hard to do. That's what I fear we're doing here, opening up issues that create issues w/ ... maybe reputational damage doesn't happen at issuer, undermines system as a whole.
… I did want to come back to brent's point, that would at least be an improvement on the text today. If there was a way to require that relatedResources context entries are used are integrity protected must be checked/verified by RPs downstream, that signal would be useful.
… You will have RP that doesn't support feature and they will simply ignore. Unless we have normative text that says you should fail verification that you don't support validating related resources in credential, then you have a downgrade attack.

Dave Longley: I largely don't agree with the analogy, a lot of this is coming from , we're talking about even if you do it and don't understand form and terms, your application is invalid. It fundamentally does not matter, if you receive a context you don't understand, you need to reject.

Dmitri Zagidulin: understanding the context and verifying its integrity are completely separate though.

Tobias Looker: +1 dmitriz.

Dave Longley: All of that being said, I don't think I'd have a problem saying that if relatedResoruce appears and context is expressed in that valid that is need to be checked. I will note that will eliminate use cases where issuers don't want to force a check like that, I don't know of many that will have that requirement. If context URL, you check that value.

Brent Zundel: Seems like proposal on the table is that it will not be required, MAY or SHOULD, some issuer MAY use context integrity protection, however, if that has been done, verifier MUST fail verification if that fails.

Manu Sporny: I'm trying to think through that as applied to selective disclosure and the BBS use cases--unlinkability.
… for selective disclosure you would eliminate the ability to do the international drivers license vs. a local drivers license.
… if it's a MAY, then perhaps we avoid that problem.
… we would need matching language that if you do that you must make the disclosure of the related resources--at least the context files (as there could resources).

Tobias Looker: I don't think you would, the context would just be transmitted to the verifier and un-used in the JSON-LD expansion.

Dmitri Zagidulin: that complexity is the domain of the Selective Disclosure spec.

Manu Sporny: there's a great deal of new complications if we do this.
… and I still don't think an attack has been demonstrated for this situation.
… things like RDF canonicalization do protect the terms.
… if you get any context in that you don't understand, you're supposed to stop processing.
… so we're left with only contexts that one knows about.
… so I don't know of any attacks that are left at that piont.

Dmitri Zagidulin: "dont know what the context is" is way ambiguous phrasing.
Dmitri Zagidulin: verifiers can know what the context is, but still use the wrong digest context.

Dave Longley: dmitriz: signature won't verify.
Dave Longley: dmitriz: that's a separate issue.

Manu Sporny: JOSE/COSE doesn't provide those protections.
… so I don't think it's necessary.
… it has not been demonstrated that this is a workable attack based on the text as written today.
… I think profiling this is sufficient.
… even adding a "verifiers must..." text doesn't change anything.
… based on the text in the spec today, there has yet to be a proven attack based on this request.
… but I could hold my nose.
… but frankly that's going to happen in production anyway.

Dave Longley: fundamentally: you cannot read the properties of a context that you don't understand, you must reject or transform to a context you do understand, before using the properties.
Dave Longley: so resource integrity is irrelevant.

Dmitri Zagidulin: to me, the issue with relatedResources is now unrelated to @context integrity. Instead, it's a spec clarity issue. The whole point of adding a 'relatedResource' is that the contents MUST be verified.

Brent Zundel: I'm all for requiring people to do stuff.

Dmitri Zagidulin: @dlongley - "understand"ing a context is at worst a meaningless term, and at best is overloaded.

Manu Sporny: yeah...funny, but that's not what I was saying.

Dave Longley: dmitriz: agreed, it's hard to get clear language.

Michael Jones: I will object to the proposal because it fails to secure the context, if you don't secure it, context substitution doesn't occur, it needs to be compulsory on both sides. Knowing contexts, doesn't prevent attack, it can happen w/ other contexts that you can accept.

Dave Longley: -1 to mike jones, that's inaccurate, and a misunderstanding of data integrity.

Manu Sporny: That's not how it works selfissued.

Dave Longley: +1 to JoeAndrieu.

Joe Andrieu: I agree with Manu, I dont' think a successful attack has been proven. I'm not understanding where this attack shows up.

Benjamin Young: +1 to JoeAndrieu.

Joe Andrieu: I think we're also conflating verification and validation.

Dave Longley: +1 to JoeAndrieu.

Joe Andrieu: People are talking about securing context, and string and context is secured. Historical notion might be what is being confused, context that is returned as a resource should be different, but should be different any time... context doesn't have hash to do it, trying to cobble one after the fact might be useful, but doesn't have credibility for verification.

Dmitri Zagidulin: At this point, the relatedResource language issue is unrelated to context integrity, now it's a clarity of the spec issue. The whole point is if issuer puts it there, validation has to fail if checks don't apply. If that's not clear from spec, we need to fix spec language. If we do, we gain spec backed ability to mandate signature be checked.

Dmitri Zagidulin: @BigBlueHat - digest validation does not require fetching. you can cache as usual.

Dave Longley: -1 the signature will fail.

Tobias Looker: I'll try and reply a practical attack when you follow all the guidance, malicious party, RP validates a government identity credential and I know schema and ontology, corrupt RP in some way to manipulate credential presentations. Problem is known context values are not issuer specific, implementations have one document load, contexts I trust, not related to issuers, so if I can get a context value trusted for a different VC that somehow changes or substitutes ontology I can use trusted context to.

Manu Sporny: -1 nope, signature will fail.

Dave Longley: -1 not true, the signature will fail.

Joe Andrieu: -1 the context property is secured.

Dmitri Zagidulin: @tplooker - what does "trusted" mean here? And yes, signature will fail.

Tobias Looker: These could be conflicting, general purpose bucket of context values, as RP, if I know what context values you trust, I might be able to manipulate those in a way that allows me to do the attack.

Manu Sporny: tplooker that is not how a secure implementation is going to work.
… you should never do what you described.
… one typically validates the incoming JSON with JSON Schema or similar.
… we've also discussed use case specific verifiers.
… so the attack you're describing just doesn't happen.
… you check an inbound schema for the JSON.
… so the attack you just highlighted does not happen in a good implementation.
… responding to dmitriz the relatedResource feature was never meant to be a verification component.
… you only verify the ones for your use case.
… let's say you listed every single image related to your credential.
… but the verifier only wants one of those.
… if the spec required every relatedResource to be verified, then you broke that use case.
… so big -1 to making relatedResource enforced in that way.
… that was not the intention of that term.
… and there would be big minus ones for that position.
… it's a question of layering.

Dmitri Zagidulin: manu -1 to that interpretation. there's no flag that an issuer can set, in the relatedResource image, to mark which ones are required.
Dmitri Zagidulin: manu would you be open to adding a 'required' flag, to each relatedResource object?

Dave Longley: short on time, I will just type: to respond to tobias's example, the signature would fail in that case, you can't do that attack (where you use some other trusted context to confuse the verifier) ... and it further highlights point that the verifier needs to understand the contexts it is using to read documents.

Tobias Looker: Just to replay what I was trying to say, context values are shared across all credentials an RP verify's, generally that means if I have conflicting context values that redefine terms in different ways across issuers I can perform context manipulation within the bounds of the trust context values of an RP to maliciously modify a credential.

Joe Andrieu: At any point we have to dereference, we're creating a security concern... relatedResources, don't dereference if you need to... we can't mandate it.

Dave Longley: to just say -- it's fine to dereference a context -- so long as you're transforming to one that you don't have to dereference before you read the document :).

Tobias Looker: I mean I agree with JoeAndrieu but @context values are HTTPS URLS so lets be practical people can and will dereference these.

Anil John: I believe that I can get what I need out of this by profiling the current text.

Dmitri Zagidulin: At the moment we don't have capability to fulfill Anil's requirement, issuer flag what resources must be secured. My proposal is true/false to related resource for signal from issuer.

Dave Longley: tobias, please see my point ^ you do transformation in that case.

Joe Andrieu: tplooker yes, that should give a good response for a human. but verifiers should not do so in production.

Brent Zundel: Thank you, this was a great conversation, appreciate the civility, consideration of different viewpoints. This conversation will go into the issue. We continue to explore the possibilities that are on the table. Hpefully we can move this forward in a way that is amenable to folks and find some consensus.
… Thanks to all for the special topic call, we'll chat next week.


@iherman
Copy link
Member

iherman commented Sep 27, 2024

The issue was discussed in a meeting on 2024-09-26

List of resolutions:

  • Resolution No. 2: add the following text to VCDM 2.0 to address DI Issue 272: If a verifier makes use of a resource listed in a relatedResource property, it MUST check the content integrity of the resource as specified in a relatedResource entry.
View the transcript

3.2. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)

See github issue vc-data-integrity#272.

Manu Sporny: Currently there are 145 comments regarding security vulnerabilities.
… The matter of concern is about security the context: using content integrity on the context.

Wesley Smith: wes-smith has joined #vcwg.

Manu Sporny: We have made adjustments to the core specification to @vocab, no longer using it by default.
… We have enhanced the content of the validation process in the data model and the data integrity specifications.
… We have enhanced the document process and have committed to adding the attack initially described to the test suites.
… There has been a partial implementation/discussion around content integrity protection over the context.
… There have been a number of discussions where if the verifier knows what context they're process, that there is no attack.
… One of the proposals on the table is that if an issuer secures the context, a verifier must verify the context. That would enable issuers that are concerned about this to force verifiers to do what they want.
… Securing the context means adding a cryptographic hash of the context to the context URL in the VC.

Brent Zundel: You said something during the break, Manu, that the changes that have already been made mean that when an issuer proves a context, the verifier needs to already be aware of the context prior to receiving the content.

Kevin Dean: An immediate concern with verifier receiving content... there may be contexts that are extensions of those that are already known. I could receive something that builds on top of existing context, relies on underlying context when I'm processing.
… Seems highly restrictive for verifier to know what they have to receive.

Dave Longley: A context URL is either well-known or opaque. A well-known URL will appear as a constant in source code. You've written your application against the terms defined in the context. Otherwise, you have an opaque URL. Your application cannot process against that as it's not known to the application.
… If your application wants to process opaque contexts, the document must be transformed to one that the application understands.
… It's OK to read the terms you understand and ignore the ones you don't.

Manu Sporny: An attack has not been demonstrated. There's no code that demonstrates an attack that the specification defines for context management.

Dmitri Zagidulin: I don't take issue with there being no attack demonstrated. It would not go amiss if we required cryptographic verification by the verifier of the terms in a related context. We have this resource, we have cryptographic protection, so why not make it required by the verifier even if there is no demonstrated attack?

Dave Longley: If an opaque URL shows up, you must transform that document to something you can understand before you can consume it. On top of this, the cryptosuites protect the document, so verification must take place first. Only after that does the context define how the document must be processed.

Denken Chen: +1 to dmitriz.

Dmitri Zagidulin: Adding that requirement to verifiers has additional benefits. It prevents developer confusion if the context changed. I agree that hash checking isn't required but adds value to the verification.

Dave Longley: dmitriz, what you're talking about to me, is "differently identified contexts, which happen to be content identified" which, to me, is a separate consideration entirely.

Juan Caballero: this could perhaps be called an "ergonomic" or "developer experience" argument, since "oh wait i'm storing linked documents wrong" is a more helpful error message than "signature failed".

Juan Caballero: even if, functionally, they're the same outcomes.

Brent Zundel: The solution we are discussing and that has been discussed in the past is the proposal that if an issuer elects to protect the integrity of the context through the related resource property, we should add a requirement that a verifier must verify the integrity of the context.
… The change that is being proposed, "If that's done, the verifier must check".
… Dave has pointed out that in this particular this, that may be moot.
… How do folks feel about requiring that the verifier validate the integrity if the issuer specifies it? This would elevate SHOULD to MUST for resource integrity protection.
… Whether or not it directly addresses the issue doesn't matter. On the one hand, it's a general thing that makes sense to folks. On the other hand, it makes the people that raised the issue to be happy.

Benjamin Young: I think we need to separate the "why" because this is not enhancing the security and is adding processing cost.
… If we restrain it to context files, forcing people to process related resources may be OK, but I don't believe that the expense of verifying all related resources (e.g., a PDF that is not actually used) can be expensive.
… It does not enhance security. We should be clear on why we're doing it and at what cost.
… Right now, I don't think the origin of the idea is not addressed by the idea itself.

Manu Sporny: +1 to what Benjamin said. As long as it's optional for issuers for contexts, that seems like it would be acceptable to verifiers.
… Again, a successful attack has not been demonstrated.
… We are creating the expectation that can trip up developers where contexts are required to be validated but other related resources are not.

Ivan Herman: The only thing I could add to that is that I think we should do it, and we should draw attention to the fact that issuers should treat this feature with care because of the high cost to verifiers.

Dave Longley: Something that would make a lot of sense to me to propose that if your application reads the related resource field, it must check the hash.
… If you are going to read the related resource and there is a hash, you must check it.

Dmitri Zagidulin: @dlongley - that's really elegant. it fits with what Benjamin/bigbluehat was saying (that we don't need to /automatically/ check the URLs), but if you DO dereference/fetch.. yeah.

Benjamin Young: It's possible now. You can do this now. Your implementer can use related resource. We added language saying that you can do that.
… My point is that making it a MUST or stronger than it is now is necessary.

Brent Zundel: The part of it that was clarified for me is that, if you are planning to dereference the URL, you MUST check it. I think that's a very sane thing to add to the spec.

Juan Caballero: +1 to brent.

Michael Jones: As I see it, related resource is a security feature. It makes no sense, if it is present, to not require it as a security feature.

Brent Zundel: I think we're at the point where we can entertain language for the proposal.

Manu Sporny: Proposed language.

Brent Zundel: Proposal is to add this language to the specification to address issue 272, raise as PR and close 272.

Michael Jones: Not OK with the beginning part. I think the way it should be worded is that, if something is listed in the related resources list, its integrity must be checked.

Brent Zundel: What if it says, "If it makes use of a resource..."?
… If there are resources there that a verifier is never going to touch, is it necessary to verify anyway?

Manu Sporny: e.g., What happens if I like to a 5GB video? I don't need to check it unless I actually access it.

Dave Longley: +1 to manu, it's a bigger security problem to force downloads.

Dave Longley: it's also a privacy problem.

Dmitri Zagidulin: +1.

Manu Sporny: I'm concerned about VCs that link to lots of other documents. That could create a DoS attack on verifiers.

Dmitri Zagidulin: Agreed with Manu. Another nuance is that the other reason to not say that if it's there it must be verified. If it's cached, it should not need to verify it again as it already has a local copy.

Michael Jones: I get the 5GB video example. That's not the motivating case. What we're talking about are the contexts that are going to be a few kB. To partially address the issue, we should say that if the related resource is a context issue, it must be checked.

Benjamin Young: We could use language like "activate". For example, the subresource spec for HTML doesn't process them until they're used.
… Context files can be quite large, e.g., schema.org.
… Maybe there's better language as you might have it on hand.

Kevin Dean: Back to the question of there being multiple context files that are not known to the verifier.

Juan Caballero: +1 to bigbluehat - aligning with SRI language or other WG precedent sounds good.

Dave Longley: +1 to Kevin.

Kevin Dean: It's not that you're going to process them because you don't understand them, those context files are irrelevant for the verifier, no need to download them. I don't see any issue with the idea that if you're not going to use it, you're not going to check it.
… Caching might be considered verified if its already verified and cached. If there is content integrity on something you're going to use, you don't need toverify it. I don't think we need to call out contexts as a type of file, because there are some that you will not use.

Gabe Cohen: I'm struggling to understand the use case where it's there but not going to be used.

Dave Longley: to answer Gabe, in the three party model, you don't know what verifiers will use and what they won't.

Dave Longley: and different verifiers can make different choices.

Dmitri Zagidulin: @gabe - no, it's more like, the VC is gonna be useful to many parties. SOME parties will actually use the video. and some won't. doesn't make it less legitimate.

Dave Longley: +1 to dmitriz.

Gabe Cohen: if some will use it then it should be verifiable.

Michael Jones: I'm fine with if you're verifying something you already have cached you don't need to verify again. If something is listed in a context, are you free to not use the context? I thought that there are context things that have non-local effects so you have to look at all contexts.

Benjamin Young: To the question about JSON-LD, in broader JSON-LD land, there are moments and times where you need to get all context files, cached or not.
… If you don't use terms from another context file, you don't need to retrieve it.
… There's no way to know that unless you signed with data integrity. If I have a pile of context files and you give me a signature over the set of the quad, if I run things through a separate process and get the same hash from the same graph, then I don't need the context files.
… As long as it maps to the same URLs, I can get the same signature.
… As long as your quads are signed, which is what data integrity is about, you're fine.
… I get that what data integrity gives you is the signed quads, but the issue description gives the attacker the ability to change the meaning.
… It doesn't change the semantic meaning. The quads you would get out would not succeed.

Dave Longley: and you must never read a mapping you don't understand -- securing the contexts does not "fix it".

Dave Longley: (because that's not the problem... the problem is reading terms from a context you don't understand, which must never be done).

Dave Longley: (you must translate to context you do understand firsT).

Michael Jones: I have tried to bring all of this to Tobias to weigh in again.

Brent Zundel: I feel we are on the cusp of a decision.
… Mike, you're the only one to object to the proposed wording.

Michael Jones: What is the current proposed resolution wording?

Denken Chen: I would like to propose that libraries be updated to check all the information in the VC. We are at the early stage of VC adoption. If we don't enforce that in standard libraries... I propose discussing the boundary of safety boundary. To me, vc standard should be self-consistent in terms of chekcimg integrity of all information within the vc itself. That’s a great security improvement.

Kevin Dean: In answer to question about why have resources you don't reference... gernreric credential could have image, you're not interested in image, jut want name and address.

Dave Longley: +1 KevinDean.

Gabe Cohen: My question is, should we support issuing credentials that some verifiers may not be able to verify? Should we allow verifiers to not verify related resources? I will not block resolution.

Brent Zundel: Current proposal is as above.

Shigeya Suzuki: +1 decentra_.

Brent Zundel: Other language includes "if the verifier makes use of the resource...".

Denken Chen: If we don’t, libraries developers out there might not check it at all.

Benjamin Young: Revision proposed.

Dmitri Zagidulin: @decentra_ - re issuing credentials that some verifiers may not be able to verify -- that happens ALL the time though. for example, if a verifier just doesn't support the signature algorithm. can't verify.

Denken Chen: …that’s for the safety of the standard, for the rest of the world.

Shigeya Suzuki: @dmitriz I don't think that is the normal case.

Ivan Herman: +1.

Greg Bernstein: +1.

Dmitri Zagidulin: +1.

Paul Dietrich: "a related resource property" or "the related resource property"?

Brent Zundel: If there's language that would change your -1 vote, let us know.

Dmitri Zagidulin: it should be 'a', yeah.

Kevin Dean: +1 to "a" instead of "the".

Paul Dietrich: +1.

Jay Kishigami: +1.

Denken Chen: +1.

Proposed resolution: add the following text to VCDM 2.0 to address DI Issue 272: If a verifier makes use of a resource listed in a relatedResource property, it MUST check the content integrity of the resource as specified in a relatedResource entry. (Brent Zundel)

Manu Sporny: +1.

Gabe Cohen: +1.

Wesley Smith: +1.

Ivan Herman: +1.

Brent Zundel: +1.

Greg Bernstein: +1.

Benjamin Young: +1.

Kevin Dean: +1.

Dave Longley: +1.

Denken Chen: +1.

Jay Kishigami: +1.

Dmitri Zagidulin: +1.

Paul Dietrich: +1.

Shigeya Suzuki: +1.

Will Abramson: +1.

Juan Caballero: +1.

Michael Jones: 0.

Juan Caballero: i kinda liked "activate" but that can always be a followup PR if anyone wants to bikeshed.

Resolution #2: add the following text to VCDM 2.0 to address DI Issue 272: If a verifier makes use of a resource listed in a relatedResource property, it MUST check the content integrity of the resource as specified in a relatedResource entry.

Manu Sporny: I will raise a PR for this.

Brent Zundel: It's lunchtime.

Manu Sporny: There are three proposals related to data integrity to look at tomorrow.

Paul Dietrich: Brent, can you confirm the time for tomorrows meeting?


@msporny
Copy link
Member

msporny commented Oct 1, 2024

Based on the discussion at W3C TPAC, the final PR to address this issue has been raised here: w3c/vc-data-model#1567

This issue will be closed once that PR has been merged.

@msporny msporny added the pr exists A pull request exists to address this issue. label Oct 1, 2024
@tplooker
Copy link
Author

tplooker commented Oct 2, 2024

I was unable to attend the TPAC call on this topic, however for the record I don't believe #1567 solves the issue, because it keeps this critical security feature as optional for an issuer to implement. It is better then having nothing at all, but it is not a proper fix.

@msporny
Copy link
Member

msporny commented Oct 14, 2024

@tplooker wrote:

It is better then having nothing at all, but it is not a proper fix.

All of the PRs related to specifications that this group is working on listed in #272 (comment) have been merged. Per the discussion at W3C TPAC, as well as during the last WG call, the WG has applied every change request related to this issue that it has been able to achieve consensus on and does not believe there are any further PRs, based on the exhaustive discussion in this issue and during WG meetings, that would achieve consensus. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR1 This item was processed during the first Candidate Recommendation phase. normative This issue or PR will trigger the need for another Candidate Recommendation Snapshot pr exists A pull request exists to address this issue.
Projects
None yet
Development

No branches or pull requests