Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix JSON canonicalisation #247

Merged
merged 2 commits into from
Apr 14, 2022

Conversation

toby-jn
Copy link
Contributor

@toby-jn toby-jn commented Mar 29, 2022

Fixes #246

Release Notes:

  • Fixes error in JSON canonicalisation that stops RSA/ECDSA keys from being marshalled correctly.

Types of changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

I don't believe this breaks any compatibility, a valid client should still canonicalise the message before hashing.

Description of the changes being introduced by the pull request:

The JSON canonicalisation format (specified here: https://wiki.laptop.org/go/Canonical_JSON) does not escape of control sequences in string values, however the JSON standard (https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf) requires that all control characters (defined as U+0000 to U+001F) MUST be escaped in strings.

As a result new line characters in strings are not escaped and this produces invalid JSON and go-tuf will fail to marshal metadata containing these sequences. This is particularly problematic for RSA and ECDSA keys where the specification states they are encoded in PEM format, which will inevitably contain newline characters.

This change only changes repo to only used canonicalised JSON format for calculating the signature of the message. The serialised message just uses the standard library encoding/json, which correctly handles all control sequences to produce a valid JSON encoding.

The changes to tests are required as they use the github.com/theupdateframework/go-tuf/pkg/keys.Signer.SignMessage method directly, this method accepts a []byte instead of a json.RawMessage, which implies that it expects a canonicalised message and will not perform canonicalisation.

Please verify and check that the pull request fulfills the following requirements:

  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@trishankatdatadog
Copy link
Member

Hrm, is this also a python-tuf issue? @lukpueh

@toby-jn
Copy link
Contributor Author

toby-jn commented Mar 29, 2022

The client tests were failing due to the testdata being out of date. After regenerating they are now passing, however it has raised an interesting issue with the spec.. for the snapshot metadata, it's not entirely clear from the spec whether the hashes metadata applies to the serialised metadata file or whether it should be the hash of the canonicalised version.

As I interpret the spec, it's the serialised form, which is what's implemented here.. however this does seem a strange decision as it means the serialised metadata must be consistently serialised, but it's conceivable that an intermediate mirror or proxy could reformat the JSON. This is not directly related to this change, however as there is a change in the serialised format it is highlighting the issue.

Ignore this comment, snapshot refers to file hashes, not metadata.

Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+10000!!! I have had this issue for a very long time! THANK YOU!

FYI we may want repo.Payload(rolename) method to export the canonicalized bytes that should be signed, and maybe some private function that takes data.Signed -> canonicalized bytes that repo.Payload can call. @znewman01

@znewman01
Copy link
Contributor

FYI we may want repo.Payload(rolename) method to export the canonicalized bytes that should be signed, and maybe some private function that takes data.Signed -> canonicalized bytes that repo.Payload can call. @znewman01

+1. Two points:

  1. Do we want to canonicalize on the way out (via tuf payload) or the way in (tuf sign-payload), or both. It looks like we're doing both, as of this change, which could cause problems in event of version mismatch I think. IMO it's reasonable to assume the exact bytes we're passing to sign.Sign are the bytes we ultimately want to sign. Specifically: if you make the payload to sign on box A, then move to box B that canonicalizes slightly differently for the signing, you'll mess with the payload while signing. This will appear to succeed on box B, but box A will not be able to verify the signature. Maybe a warning if the input to sign.Sign is in an invalid format, rather than forcing it to canonicalize makes more sense? I could be persuaded differently here.

  2. This change makes the "idiomatic" usage in repo_test.go really error-prone, since a caller has to (1) get the data.Signed, (2) canonicalize, (3) sign it. IMO we should really make the workflow of the caller r.Payload -> key.SignMessage -> r.AddOrUpdate (then implement repo.Sign in those terms). I don't know that we need to address it in this PR, but probably before the next release.

Also @asraa review on #214 plz? :)

@rdimitrov
Copy link
Contributor

fyi @jku

@toby-jn
Copy link
Contributor Author

toby-jn commented Apr 2, 2022

Do we want to canonicalize on the way out (via tuf payload) or the way in (tuf sign-payload), or both.

I think I'd suggest neither in fact.. keeping all serialised representations un-canonicalised first means they're valid json, and makes it possible to modify the payload before doing sign-payload, without having to mess around with canonicalisation.

Fundamentally the only things that need to ever consider canonicalisation are sign.Sign and verifier.Verify, and all these are really doing is canonicalising the data, passing it to the specific key implementation (that operates on raw []byte) and then attaching the signature in the result.

Following this, the cli sign-payload command is really just the cli analogue to sign.Sign, and so it should accept un-canonicalised data.

that canonicalizes slightly differently for the signing

The canonicalised format is the specification to ensure this doesn't happen. If a client canonicalises differently, then it's not following the specification and would be unable to verify signatures correctly.

@toby-jn
Copy link
Contributor Author

toby-jn commented Apr 2, 2022

This change makes the "idiomatic" usage in repo_test.go really error-prone.

Yep certainly quite messy. I did look at how to do it better, and would perhaps question what these tests are trying to do vs using sign.Sign (they seem to just re-implement a lot of that logic in the test). With the changes in #214 to split things out separately into payload and sign, I think it makes it a bit easier to improve these tests.

@znewman01
Copy link
Contributor

Thanks for thinking hard about these questions! I think we're going to wind up with a better interface for users because of it.

Do we want to canonicalize on the way out (via tuf payload) or the way in (tuf sign-payload), or both.

I think I'd suggest neither in fact.. keeping all serialised representations un-canonicalised first means they're valid json, and makes it possible to modify the payload before doing sign-payload, without having to mess around with canonicalisation.

The way I'm thinking about it, that counts as "on the way in, but not the way out".

I'd actually lean towards "the way out but not necessarily the way in" to enable a use case where non-TUF tools (or a different TUF implementation) can be used to sign a blob. This workflow isn't available at the moment, but you could imagine a add-signatures knowing about some standardized signature formats. (I know this description is a little at-odds with #214 as it exists currently).

I guess the question is: "what is a payload?" If it's (1) a blob you can sign as-is, then we probably do want it to canonicalize. If it's just (2) a representation of the metadata that you need to sign, then I agree with your interpretation.

Following this, the cli sign-payload command is really just the cli analogue to sign.Sign, and so it should accept un-canonicalised data.

Maybe this is a case where the CLI and API shouldn't correspond one-to-one. The way I've been thinking about it:

  • tuf payload: grab the metadata and output in a canonicalized form
  • tuf sign-payload: just sign the bytes. In theory, should be able to be replaced by a magic OpenSSL CLI incantation (though we need better import-key functionality first).
  • tuf add-signatures: take a signature/signatures, and attach them to a given metadata file as-is.

This doesn't allow any flows where the metadata is modified before signing; perhaps we really want:

  • tuf export-metadata [--canonical]
  • tuf sign-blob <blob>
  • tuf import-metadata
  • tuf import-signatures

But it seems like we're really running up against the fact that the go-tuf CLI is more evolved than designed here :)

The canonicalised format is the specification to ensure this doesn't happen. If a client canonicalises differently, then it's not following the specification and would be unable to verify signatures correctly.

Strictly speaking, the specification defers format details to the POUF of the implementation. I can imagine wanting to sign payloads in very offline environments that don't have the full go-tuf implementation available (and consequently doesn't know how to canonicalize), but does have a different TUF implementation or just has the capacity to sign binary blobs in standard formats. This isn't a client (so there's no need to verify signatures) nor is it a repo (so there's no expectation of a full TUF implementation).

Maybe there's no actual demand for this use case, though.

@toby-jn
Copy link
Contributor Author

toby-jn commented Apr 2, 2022

I guess the question is: "what is a payload?" If it's (1) a blob you can sign as-is, then we probably do want it to canonicalise. If it's just (2) a representation of the metadata that you need to sign, then I agree with your interpretation.

Because the signature needs to be in a custom format for the add-signatures command to work, I think the external tool would need to be somewhat aware of TUF to make this work well. Doing something with openssl would inevitably require some post-processing step to transform the raw asn.1 signature into a data.Signature, so I'm not sure I see completely external signing tools as being viable, and more likely you'd need a tuf-openssl-signer that would drive openssl.

In this design, having that tool do the canonicalisation seems optimal, as it means it cannot produce invalid signatures due to user error of providing invalid canonicalised json.

Strictly speaking, the specification defers format details to the POUF of the implementation.

I'd misinterpreted this when reading the spec, I'd thought of this more like an HTTP accept header, where a client could request the metadata in different formats, and all signing information would remain intact regardless of the serialization... but yes, it seems this is not the case, which is a bit disappointing and makes the choice of canonical json a bit annoying, as it doesn't produce a valid json serialisation.

I feel like the only way to really make the spec work coherently is to mandate that all clients are using the same canonicalisation.

@toby-jn toby-jn closed this Apr 2, 2022
@toby-jn toby-jn reopened this Apr 2, 2022
@znewman01
Copy link
Contributor

Because the signature needs to be in a custom format for the add-signatures command to work, I think the external tool would need to be somewhat aware of TUF to make this work well. Doing something with openssl would inevitably require some post-processing step to transform the raw asn.1 signature into a data.Signature, so I'm not sure I see completely external signing tools as being viable, and more likely you'd need a tuf-openssl-signer that would drive openssl.

In this design, having that tool do the canonicalisation seems optimal, as it means it cannot produce invalid signatures due to user error of providing invalid canonicalised json.

If you're right that we want a tuf-openssl-signer driver, I agree: it should do canonicalization.

I'm not 100% convinced: I think there's a small enough number of tools that could plausibly be used to sign (and their signatures are usually in standard formats) that we can special case the import-signature command (and possibly the payload command) for each one. I just really think it's nice to preserve the option of signing things from environments without TUF installed (though I may be imagining the need for this).

I don't want to block this PR on solving the whole problem, though. So I guess the question is: does moving canonicalization to sign.Sign paint us into a corner?

I think the answer is: only if we think that we might want to use the go-tuf client to sign payloads from other implementations. I'm okay with giving up on that hope, and then longer-run having a coherent strategy around where canonicalization should happen. WDYT?

@znewman01
Copy link
Contributor

I've filed an issue to figure out a coherent canonicalization strategy: #261

For now, this PR fixes a real issue and doesn't seem to commit us to any decisions we're not comfortable with so I'm okay with merging it.

asraa
asraa previously approved these changes Apr 5, 2022
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's merge this. I will put out a fix for CI, since it seems broken on go 1.17 and golangci

@trishankatdatadog
Copy link
Member

I agree, let's merge this. I will put out a fix for CI, since it seems broken on go 1.17 and golangci

Who else should review this?

@asraa
Copy link
Contributor

asraa commented Apr 13, 2022

@mnm678 @ethan-lowman-dd could one of you PTAL?

@asraa
Copy link
Contributor

asraa commented Apr 13, 2022

Could you please rebase?

@toby-jn
Copy link
Contributor Author

toby-jn commented Apr 13, 2022

rebased onto master

@asraa asraa changed the title Fix JSON canonicalisation fix: Fix JSON canonicalisation Apr 13, 2022
@ethan-lowman-dd
Copy link
Contributor

Rebasing again should fix the failed conventional commit check.

@toby-jn
Copy link
Contributor Author

toby-jn commented Apr 13, 2022

Bit confused by the failing security scan, this is not related to my changes.. should I just ignore it?

@asraa
Copy link
Contributor

asraa commented Apr 14, 2022

Bit confused by the failing security scan, this is not related to my changes.. should I just ignore it?

Yeah, I'll dismiss! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Canonicalised JSON format produces invalid JSON
6 participants