Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api/metadata input validation: hashes #1441

Closed
sechkova opened this issue Jun 9, 2021 · 8 comments · Fixed by #1451
Closed

api/metadata input validation: hashes #1441

sechkova opened this issue Jun 9, 2021 · 8 comments · Fixed by #1451
Assignees
Labels
backlog Issues to address with priority for current development goals
Milestone

Comments

@sechkova
Copy link
Contributor

sechkova commented Jun 9, 2021

Description of issue or feature request:
Implement input validation for TargetFile and MetaFile hashes attribute.

Current behavior:
The new api/metadata code does not perform any input validation on hashes.
formats.py has a defined HASHDICT_SCHEMA that is not used in the new code.

Expected behavior:
Define allowed values for hashes.
Implement the verification in metadata.py

@sechkova sechkova added the backlog Issues to address with priority for current development goals label Jun 9, 2021
@sechkova sechkova added this to the weeks24-25 milestone Jun 9, 2021
@sechkova sechkova self-assigned this Jun 9, 2021
@sechkova
Copy link
Contributor Author

sechkova commented Jun 9, 2021

'Hashes' is a dictionary of the form:


{
          '<HASH ALGO 1>': '<TARGET FILE HASH 1>',
          '<HASH ALGO 2>': '<TARGET FILE HASH 2>',
          ...
}

Both keys and values could benefit a validation.

  • Valid keys are hash algorithms supported by sslib
  • Valid hash values could be defined by a regex or maybe just a 'str' is enough

Another possible option is to allow any values which will raise errors later during meta/target files hash verification step.

What must be strictly disallowed is an empty dictionary which may lead to skipping the mandatory hash verification check.

@jku
Copy link
Member

jku commented Jun 9, 2021

algorithms (dict keys) options:

  1. if we will fail to verify some algorithm for sure, then we probably want to fail early? So we could check that algorithms (dict keys) are known to sslib
  2. or we could just check that keys are type str
  3. or we could do nothing and trust that verify_hashes_and_lengths() will fail when algo is something weird

hash (dict values) options:

  1. I don't think we should try to verify that a hash is an actual hash (with e.g. regex) -- that's just not useful: it's unlikely to find accidental errors and it won't stop a malicious someone from crafting data that passes the check
  2. We might want to make sure it's a str though
  3. but just as well, we might not validate and just make sure verify_hashes_and_lengths() fails gracefully when the hash is something completely unexpected

both of these checks mostly matter for deserialization: for adding/modifying new hashes through API we should at some point provide functions that generate the hashes when given the data: E.g. TargetFile.from_data(data: Union[bytes, BinaryIO]) -> TargetFile (but this requires more design to decide e.g. which algorithms to use).

@jku
Copy link
Member

jku commented Jun 9, 2021

We wrote mostly the same comment :)

What must be strictly disallowed is an empty dictionary which may lead to skipping the mandatory hash verification check.

Oh good catch, the spec does specify this: dictionary that specifies one or more hashes

@MVrachev
Copy link
Collaborator

MVrachev commented Jun 10, 2021

For algorithms (dict keys) I think we can rely onsecuresystemslib to give us information on which algorithms it supports.
Considering we are using it to verify our signatures.
Also, there aren't many possible combinations here.

For hash(dict values) I will prefer if we do some validation during initialization and not pass this responsibility for this check to another function. So, I will prefer one of the first two options, I don't have a strong opinion about which one.

@jku
Copy link
Member

jku commented Jun 11, 2021

For algorithms (dict keys) I think we can rely onsecuresystemslib to give us information on which algorithms it supports.

Yeah that would be the way -- we absolutely do not want to guess in TUF.

The only possible worry I have is the same future situation that I tried to talk about in the key case, where in a metadata file

  • some old targets have been hashed with an old algorithm, that is no longer supported
  • but other targets have new hashes that will work
    now if we validate specific algorithms at deserialization time it means the process immediately stops -- when it might be that we didn't actually ever need to check the hashes that we don't understand, and the process could have continued succesfully.

The scenario seems unlikely but I think we should keep this sort of thing in mind when validating sets that may be extended over time (hash and signing algorithms at least).

@mnm678
Copy link
Contributor

mnm678 commented Jun 11, 2021

The only possible worry I have is the same future situation that I tried to talk about in the key case, where in a metadata file

* some old targets have been hashed with an old algorithm, that is no longer supported

* but other targets have new hashes that will work
  now if we validate specific algorithms at deserialization time it means the process immediately stops -- when it might be that we didn't actually ever need to check the hashes that we don't understand, and the process could have continued succesfully.

The scenario seems unlikely but I think we should keep this sort of thing in mind when validating sets that may be extended over time (hash and signing algorithms at least).

I wonder what the implications here are for prioritized delegations. If an attacker was somehow able to remove a supported algorithm, could they use that to convince a user to install a less-optimal package?

@jku
Copy link
Member

jku commented Jun 14, 2021

#1438 (comment)

My comment on the key-discussion appliess 100% here as well: TL;DR: Metadata validity and our implementations ability to verify hashes are two different things: I think we should not consider metadata invalid just because it contains a hash algorithm we haven't heard of.

@sechkova
Copy link
Contributor Author

Considering your comments, my suggestion in #1451:

  • valid length: greater than zero
  • valid hashes: a non-empty dictionary

Checking the validity of hash algorithms is not part of the metadata input validation and is done by securesystemslib during hash verification.

@jku jku closed this as completed in #1451 Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issues to address with priority for current development goals
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants