Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key format inconsistency #251

Closed
shibumi opened this issue Jul 6, 2020 · 19 comments
Closed

key format inconsistency #251

shibumi opened this issue Jul 6, 2020 · 19 comments
Milestone

Comments

@shibumi
Copy link

shibumi commented Jul 6, 2020

Description of issue or feature request:

Based on this PR (#250) we had a discussion around a default or custom key format. The PR showed up a few inconsistencies regarding public key generation in in-toto-keygen. We generate a key id for private keys, but no key id will be written for public keys. This lead to the question if we want to support key ids in our key format.

Current behavior:

Right now TUF and in-toto behave differently regarding key formats:

  • Neither in-toto nor TUF specs mention the keyid field when defining key object formats.
  • Neither in-toto nor TUF specs show the keyid field in metadata examples.
  • The TUF reference implementation does not include the keyid field in public key objects in root or targets metadata on disk.
  • The in-toto reference implementation does include the keyid field in public key objects in layout metadata on disk.
  • The in-toto reference implementation uses schema checks on key public key objects (held in memory) that require the keyid, and checks that allow keyid:
    GPG_PUBKEY_SCHEMA -- requires keyid
    ANYKEY_SCHEMA -- requires keyid
    PUBLIC_KEY_SCHEMA -- does not require keyid, but allows it because unrecognized keys are allowed
    KEY_SCHEMA -- does not require keyid, but allows it because unrecognized keys are allowed.
  • The TUF reference implementation uses in repository tool and keydb among others the ANYKEY_SCHEMA, which does require keyids.
  • Both TUF and in-toto reference implementations seem to write ed25519 and ecdsa public keys to disk without keyid.

Also we seem to differ between different stages (in-memory, on-disk, etc):

  • in memory -- with keyid
  • on disk inside layout (in-toto) or root/targets (tuf) metadata — with keyid in in-toto, w/o keyid in tuf
  • on disk as custom pub key serialization format — w/o keyid

This comments from @lukpueh might be important for this PR, too: (#250 (comment)):

If we make a potentially breaking change, we might want to consider switching to a standard format altogether (we do use PEM for RSA keys on disk), or at least revise the included fields, e.g. why do ed25519/ecdsa private keys on disk include a keyid field and public keys don't, or why do both not include the keyid_hash_algorithms field, etc.... (see theupdateframework/tuf#848 for discussion about the latter field).

Right now, securesystemslib and its dependents do know how to handle their keys. We can load public keys from disk (without keyid), keep them in memory (with keyid) and add them to in-toto layout metadata (with keyid), or in tuf root/targets metadata (without keyid).

It would be good to make it very clear, when we use which format why. The various securesystemslib.formats.*KEY_SCHEMAs, are not particularly helpful for that. Some of them require a keyid and other don't which (unexpectedly) makes it optional (see #183 (comment)).

Moreover, the way we currently use the "serialization" function format_keyval_to_metadata looks like a lot of unnecessary work, e.g.:

# simplified and slightly exaggerated for emphasis
a = key["a"]
b = key["b"]
c = key["c"]
metadata = format_keyval_to_metadata(a, b, c)
# metadata is {"a": "...", "b": "...", c: "...", d: "..."}

While working on the Go implementation Lukas found an issue in the PKCS key loading.
In the Go implementation we are able to load a PKCS1 private key, however while extracting the public key out of the PKCS1 key we are storing the public key as PKCS8. Therefore we end up with two different PEM representations in our in-memory key format. The securesystemslib seems to do it that way, too. We should really have a look at this.

Should we drop PKCS1 support completely for private keys?

Expected behavior:

to be discussed

@shibumi
Copy link
Author

shibumi commented Jul 6, 2020

@lukpueh can you have a look if this issue includes all of your comments? I tried to include your 2 summaries from slack

@JustinCappos
Copy link

@mnm678 This is obviously one for you to watch / participate in as well...

@shibumi
Copy link
Author

shibumi commented Jul 9, 2020

Note: I have modified the issue and added an observation from our work on the golang implementation.
If we extract the public key from a PKCS1 private key, the public key will be in PKCS8 format. Thus we have two different PEM representation types in our in-memory key data. This behavior is consistent with the securesystemslib and we should definitely have a look on this. We could drop PKCS1 support for private keys, however I am not sure if this would bring up some complications in current use cases. This is interesting for @trishankatdatadog or @SantiagoTorres too, I guess.

@lukpueh
Copy link
Member

lukpueh commented Jul 10, 2020

Thanks for adding the note, @shibumi!

I think there is no harm in consolidating the RSA in-memory format to PKCS8, private keys are never in tuf/in-toto metadata, so there should be no metadata backwards compatibility problem. That said, I don't think it is a big deal, it just looks a bit odd that securesystemslib private keys have the private portion in a PKCS1 PEM block and the public portion in a PKCS8 PEM block. I doubt that the mismatch is intentional.

So I'd say it is not a priority to change this, but I would at least document it somewhere.

FYI all, we originally discussed this in in-toto/in-toto-golang#56

@shibumi
Copy link
Author

shibumi commented Jul 13, 2020

Hi everybody,
I made another observation today. Right now the securesystemslib and the in-toto python implementation store the private and public key pairs differently depending on the key type.

We store RSA keys in-memory as PEM blocks as specified in RFC7468,
but ed25519 are being stored in-memory as hex encoded byte strings.

In my opinion, this adds a huge technical debt and layer for making mistakes and if we agree on loading keys as specified in RFC7468](https://tools.ietf.org/html/rfc7468) we could also agree on storing them in-memory for future versions on the long term.

I am not arguing that we should change this now, but on the long term we should definitely work on this, in my opinion.

@shibumi
Copy link
Author

shibumi commented Jul 22, 2020

A nice example for such technical debt is this awesome PR from @trishankatdatadog : theupdateframework/python-tuf@1a8a0e7...f5eb59b#diff-6289f254358d3e8e498537ca81acd6fdR247

The __get_tuf_public_key function makes it obvious. We need to handle every key type on its own way. Regarding the vault support PR, this means, that we decode base64 the ed25519 key, then hex encode it again for our in-toto in-memory key format.
The RSA keys need special treatment, too. Looks like ecdsa is the only key type we can directly import from the vault public keys, (I guess, this is because the securesystemslib has not yet full ecdsa support? Therefore we are free on how to deal with it?)

This is definitely something to improve for the future :)

@trishankatdatadog
Copy link
Contributor

A nice example for such technical debt is this awesome PR from @trishankatdatadog : theupdateframework/tuf@1a8a0e7...f5eb59bdiff-6289f254358d3e8e498537ca81acd6fdR247

The __get_tuf_public_key function makes it obvious. We need to handle every key type on its own way. Regarding the vault support PR, this means, that we decode base64 the ed25519 key, then hex encode it again for our in-toto in-memory key format.
The RSA keys need special treatment, too. Looks like ecdsa is the only key type we can directly import from the vault public keys, (I guess, this is because the securesystemslib has not yet full ecdsa support? Therefore we are free on how to deal with it?)

This is definitely something to improve for the future :)

Well, the complication is mostly in filling in the right fields (esp. "scheme") in the key dictionary SSLib expects, which is not the hard part. Vault stores RSA/ECDSA pubkeys as PEM, so we just copy those over, but Ed25519 is base64-encoded, so we just have to decode that.

The real messiness, IMHO, is requiring "keytype", "scheme", and "keyid" baked into key formats on disk, which doesn't always make sense (especially for RSA). Also, duplicate fields (why both "keytype" and "scheme"?)

@lukpueh
Copy link
Member

lukpueh commented Jul 28, 2020

The real messiness, IMHO, is requiring "keytype", "scheme", and "keyid" baked into key formats on disk

Agreed! I think this is what this issue here is partly about. Btw. we don't bake in keytype/scheme/keyid for RSA on disk, we support standard PEM format there. We only have a custom format for ed25519 and ecdsa.

Also, duplicate fields (why both "keytype" and "scheme"?)

Please see #239

@shibumi
Copy link
Author

shibumi commented Jul 28, 2020

. We only have a custom format for ed25519 and ecdsa.

Small note on this: in-toto-golang is doing this differently now. We store ed25519 as PEM on disk there. For ecdsa and in-toto-golang we have not a decision yet, but I am pretty sure we do the same for it, because it would make sense.

@lukpueh
Copy link
Member

lukpueh commented Aug 5, 2020

Erratum:
I mistakingly claimed above that securesystemslib uses PKCS8 for rsa public keys, and just realized that PKCS8 (RFC5208) defines syntax for private keys only. (The misconception that PKCS8 also defines public key syntax seems to exist elsewhere on the internet too). In reality we use the X.509 SubjectPublicKeyInfo (RFC5280) PEM format for rsa public keys.

However, I still stand by my claim that it looks a bit odd that we use PKCS1 (RFC8017) for private key files but not for public key files (yes, PKCS1 does define syntax for both). I also still think we don't need to change anything about it.

@shibumi
Copy link
Author

shibumi commented Aug 5, 2020

Another issue that came up from @trishankatdatadog : #262 (comment)

We need to design a more sensible key distribution format going forward instead of 
using a single scheme string to encode hashing+signing+marshaling algorithms+etc.

I just thought, I would add this here as comment, because I think it's another issue we should have in mind, if we ever want to change our in-memory key format.

@mnm678
Copy link
Contributor

mnm678 commented Aug 5, 2020

Another thought for future key changes from theupdateframework/tuf#1091 (comment) is to include in the key format a representation that will be the same for all uses of a key (different signing or hashing algorithms) such as the modulus and exponent for RSA. This would be useful for ensuring a unique threshold of keys is being applied to a metadata file. (We decided this issue can be put off for now, but if anyone is looking at the key format in the future this might be something to consider)

@shibumi
Copy link
Author

shibumi commented Aug 29, 2020

I think I have found another potential problem or the problem is me not understanding how this works:

If we call in-toto run on a directory with the directory path and the key as parameters, RecordArtifacts will hash all files in this directory, but with which hash function? Right now we just hash everything in in-toto-golang with sha256, but I am a little bit confused right now on how we could make this hash function independent. What I, as a user, would expect is: the hash function will be chosen by the used key. So if I use ed25519, we should also hash all files with sha512, but what we do right now is the exact opposite. Am I wrong and are these two completely unrelated tasks (signing the link files and recording the files via hashing their content)?

Let us say these tasks should be unrelated to each other, via which mechanism do we pass the right hash function to the RecordArtifacts call? I see two options here:

  1. We use the scheme keyword in our key objects. This would mean, that recording artifacts with rsassa-pss-sha256 would hash all files with sha256. I think we should also generate the keyID based on this hash (this would deprecate the keyid_hash_algorithms field. I couldn't find any hint to this field in our in-toto specification anyway). This also means, that these tasks are related to each other. Key Scheme == hashing algorithm for the recordArtifacts step.

  2. We need a different mechanism for making this independent from each other, either via passing the wanted hash function to the recordArtifacts function as additional parameter or via some config file. While investigating the in-toto python cli tools, I couldn't find a way to pass a specific hash function to in-toto-run either. Looking into the code gives me this:

def _hash_artifact(filepath, hash_algorithms=None, normalize_line_endings=False):
  if not hash_algorithms:
    hash_algorithms = ['sha256']
    ....

I guess, this means, that in-toto supports only sha256 right now, too. The in-toto lib does not make use of the hash_algorithms parameter yet, is this correct? Do we track multi-hash support somewhere?

@lukpueh
Copy link
Member

lukpueh commented Aug 31, 2020

@shibumi, thanks for your question and the detailed account of how you tried to find an answer!

The short answer:
in-toto uses cryptographic hashes for (1) signature generation, (2) keyids and (3) artifact recording. None of these are related.

The longer answer:

  1. The scheme in keys, only specifies the hash algorithm to create a digest of the signed part of a metadata object and feed it into the corresponding signature generation function. It is completely oblivious to the contents of signed.
  2. The keyid in keys may be generated using any of the algorithms in keyid_hash_algorithms. The feature was added in TUF (and adopted in in-toto) for flexibility but never made it into neither spec and is likely to be dropped (see Documenting or creating a TAP to describe keyid_hash_algorithms theupdateframework/python-tuf#848 and Add TAP for keyid flexibility theupdateframework/taps#112).
  3. (not really a securesystemslib topic) Neither the in-toto-run cli, nor the runlib.in_toto_run api currently support configuring the hash algorithm for recording artifacts (even though internally we have the infrastructure to use different algorithms, as you saw in runlib._hash_artifact). The in-toto spec does not mandate an implementation to support more than one algorithm, so I guess we are fine here. But maybe still worth a ticket (given that it was worth a ticket in in-toto-golang :) ...).
    What might be more interesting is a discussion about whether match rules should be able to match hash objects, where not all hash algorithms are listed in both objects (e.g. {"sha256": "abc"} == {"sha256": "abc", "sha512": "123"} ?), and if so, whether there should be any prioritization of algorithms. cc @SantiagoTorres

@shibumi
Copy link
Author

shibumi commented Aug 31, 2020

@lukpueh thanks for your clarification

Reading your long answer I am able to compile these points for in-toto-golang:

`1. The scheme in keys, only specifies the hash algorithm to create a digest of the signed part of a metadata object and feed it into the corresponding signature generation function. It is completely oblivious to the contents of signed.

Looks like I have implemented this correctly

The keyid in keys may be generated using any of the algorithms in keyid_hash_algorithms. The feature was added in TUF (and adopted in in-toto) for flexibility but never made it into neither spec and is likely to be dropped (see theupdateframework/python-tuf#848 and theupdateframework/taps#112).

That's what is missing right now in the go implementation. How do we want to proceed here? What way will TUF go? CC: @trishankatdatadog

I feel uncomfortable with calling a hardcoded sha256 function and if we set a specific function it should be in the specification, I guess. The alternative would be to make this dynamic and/or fallback on a specific hash function if necessary.

(not really a securesystemslib topic) Neither the in-toto-run cli, nor the runlib.in_toto_run api currently support configuring the hash algorithm for recording artifacts (even though internally we have the infrastructure to use different algorithms, as you saw in runlib._hash_artifact). The in-toto spec does not mandate an implementation to support more than one algorithm, so I guess we are fine here.

ah okay interesting, I implemented it this way, that the dev is able to choose a specific hash function.

The match rules for hash algorithms is definitely a different topic we should talk about :D

@lukpueh
Copy link
Member

lukpueh commented Aug 31, 2020

The keyid in keys may be generated using any of the algorithms in keyid_hash_algorithms. The feature was added in TUF (and adopted in in-toto) for flexibility but never made it into neither spec and is likely to be dropped (see theupdateframework/tuf#848 and theupdateframework/taps#112).

That's what is missing right now in the go implementation. How do we want to proceed here? What way will TUF go? CC: @trishankatdatadog

We plan to remove the keyid_hash_algorithms field and also to allow the user to assign custom keyids to keys.

I feel uncomfortable with calling a hardcoded sha256 function and if we set a specific function it should be in the specification, I guess. The alternative would be to make this dynamic and/or fallback on a specific hash function if necessary.

IMO it's perfectly fine to keep using sha256 keyids as default, as long as we don't expect them anywhere.

@lukpueh
Copy link
Member

lukpueh commented Sep 11, 2020

Thanks again for creating this issue, @shibumi, and for all your insights, everyone else! I think we should try to identify and create a bit more bite-sized sub-issues, coordinating with discussions in secure-systems-lab/dsse#1 and other issues under the 1.0.0 milestone.

@lukpueh lukpueh added this to the 1.0.0 milestone Sep 11, 2020
lukpueh added a commit to lukpueh/securesystemslib that referenced this issue Oct 14, 2020
Add convenience dispatcher for other private key import interface
functions, to import any of the supported private key types from
file (rsa, ecdsa, ed25519).

This transfers a similar function, currently implemented in
in-toto.util, in order to close in-toto/in-toto#80.

Caveat:
- The key type must be specified via argument (or defaults to RSA).
  In the future we might want to let the parser infer the
  key type, as we do in the related in-toto-golang implementation. See
  https://github.com/in-toto/in-toto-golang/blob/5fba7c22a062a30b6e373f33362d647eabf15822/in_toto/keylib.go#L281-L361

- Currently, the function does not support a signing scheme
  parameter and thus assigns the default value from
  import_rsa_privatekey_from_file to the returned key. For the
  other keep types, the scheme is encoded in the on-disk format.
  In the future we might want to consolidate this as part of secure-systems-lab#251.
  For now the primary goal is to have a simple interface that is
  enough to close in-toto/in-toto#80.
lukpueh added a commit to lukpueh/securesystemslib that referenced this issue Oct 15, 2020
Add convenience dispatcher for other private key import interface
functions, to import any of the supported private key types from
file (rsa, ecdsa, ed25519).

This transfers a similar function, currently implemented in
in-toto.util, in order to close in-toto/in-toto#80.

Caveat:
- The key type must be specified via argument (or defaults to RSA).
  In the future we might want to let the parser infer the
  key type, as we do in the related in-toto-golang implementation. See
  https://github.com/in-toto/in-toto-golang/blob/5fba7c22a062a30b6e373f33362d647eabf15822/in_toto/keylib.go#L281-L361

- Currently, the function does not support a signing scheme
  parameter and thus assigns the default value from
  import_rsa_privatekey_from_file to the returned key. For the
  other keep types, the scheme is encoded in the on-disk format.
  In the future we might want to consolidate this as part of secure-systems-lab#251.
  For now the primary goal is to have a simple interface that is
  enough to close in-toto/in-toto#80.
lukpueh added a commit to lukpueh/securesystemslib that referenced this issue Oct 16, 2020
Add convenience dispatcher for other private key import interface
functions, to import any of the supported private key types from
file (rsa, ecdsa, ed25519).

This transfers a similar function, currently implemented in
in-toto.util, in order to close in-toto/in-toto#80.

Caveat:
- The key type must be specified via argument (or defaults to RSA).
  In the future we might want to let the parser infer the
  key type, as we do in the related in-toto-golang implementation. See
  https://github.com/in-toto/in-toto-golang/blob/5fba7c22a062a30b6e373f33362d647eabf15822/in_toto/keylib.go#L281-L361

- Currently, the function does not support a signing scheme
  parameter and thus assigns the default value from
  import_rsa_privatekey_from_file to the returned key. For the
  other keep types, the scheme is encoded in the on-disk format.
  In the future we might want to consolidate this as part of secure-systems-lab#251.
  For now the primary goal is to have a simple interface that is
  enough to close in-toto/in-toto#80.
@lukpueh
Copy link
Member

lukpueh commented Nov 24, 2020

Cross-referencing question about key formats in theupdateframework/go-tuf#127.

@lukpueh
Copy link
Member

lukpueh commented Dec 9, 2020

I have divided this issue into 3 separate chunks:

Let's close here and have further more focussed discussions there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants