api/metadata input validation: hashes #1441

sechkova · 2021-06-09T11:21:45Z

Description of issue or feature request:
Implement input validation for TargetFile and MetaFile hashes attribute.

Current behavior:
The new api/metadata code does not perform any input validation on hashes.
formats.py has a defined HASHDICT_SCHEMA that is not used in the new code.

Expected behavior:
Define allowed values for hashes.
Implement the verification in metadata.py

The text was updated successfully, but these errors were encountered:

sechkova · 2021-06-09T11:32:25Z

'Hashes' is a dictionary of the form:


{
          '<HASH ALGO 1>': '<TARGET FILE HASH 1>',
          '<HASH ALGO 2>': '<TARGET FILE HASH 2>',
          ...
}

Both keys and values could benefit a validation.

Valid keys are hash algorithms supported by sslib
Valid hash values could be defined by a regex or maybe just a 'str' is enough

Another possible option is to allow any values which will raise errors later during meta/target files hash verification step.

What must be strictly disallowed is an empty dictionary which may lead to skipping the mandatory hash verification check.

jku · 2021-06-09T11:36:02Z

algorithms (dict keys) options:

if we will fail to verify some algorithm for sure, then we probably want to fail early? So we could check that algorithms (dict keys) are known to sslib
or we could just check that keys are type str
or we could do nothing and trust that verify_hashes_and_lengths() will fail when algo is something weird

hash (dict values) options:

I don't think we should try to verify that a hash is an actual hash (with e.g. regex) -- that's just not useful: it's unlikely to find accidental errors and it won't stop a malicious someone from crafting data that passes the check
We might want to make sure it's a str though
but just as well, we might not validate and just make sure verify_hashes_and_lengths() fails gracefully when the hash is something completely unexpected

both of these checks mostly matter for deserialization: for adding/modifying new hashes through API we should at some point provide functions that generate the hashes when given the data: E.g. TargetFile.from_data(data: Union[bytes, BinaryIO]) -> TargetFile (but this requires more design to decide e.g. which algorithms to use).

jku · 2021-06-09T11:40:35Z

We wrote mostly the same comment :)

What must be strictly disallowed is an empty dictionary which may lead to skipping the mandatory hash verification check.

Oh good catch, the spec does specify this: dictionary that specifies one or more hashes

MVrachev · 2021-06-10T14:45:34Z

For algorithms (dict keys) I think we can rely onsecuresystemslib to give us information on which algorithms it supports.
Considering we are using it to verify our signatures.
Also, there aren't many possible combinations here.

For hash(dict values) I will prefer if we do some validation during initialization and not pass this responsibility for this check to another function. So, I will prefer one of the first two options, I don't have a strong opinion about which one.

jku · 2021-06-11T05:57:32Z

For algorithms (dict keys) I think we can rely onsecuresystemslib to give us information on which algorithms it supports.

Yeah that would be the way -- we absolutely do not want to guess in TUF.

The only possible worry I have is the same future situation that I tried to talk about in the key case, where in a metadata file

some old targets have been hashed with an old algorithm, that is no longer supported
but other targets have new hashes that will work
now if we validate specific algorithms at deserialization time it means the process immediately stops -- when it might be that we didn't actually ever need to check the hashes that we don't understand, and the process could have continued succesfully.

The scenario seems unlikely but I think we should keep this sort of thing in mind when validating sets that may be extended over time (hash and signing algorithms at least).

mnm678 · 2021-06-11T12:45:18Z

The only possible worry I have is the same future situation that I tried to talk about in the key case, where in a metadata file
* some old targets have been hashed with an old algorithm, that is no longer supported

* but other targets have new hashes that will work
  now if we validate specific algorithms at deserialization time it means the process immediately stops -- when it might be that we didn't actually ever need to check the hashes that we don't understand, and the process could have continued succesfully.
The scenario seems unlikely but I think we should keep this sort of thing in mind when validating sets that may be extended over time (hash and signing algorithms at least).

I wonder what the implications here are for prioritized delegations. If an attacker was somehow able to remove a supported algorithm, could they use that to convince a user to install a less-optimal package?

jku · 2021-06-14T06:46:26Z

#1438 (comment)

My comment on the key-discussion appliess 100% here as well: TL;DR: Metadata validity and our implementations ability to verify hashes are two different things: I think we should not consider metadata invalid just because it contains a hash algorithm we haven't heard of.

sechkova · 2021-06-16T11:49:53Z

Considering your comments, my suggestion in #1451:

valid length: greater than zero
valid hashes: a non-empty dictionary

Checking the validity of hash algorithms is not part of the metadata input validation and is done by securesystemslib during hash verification.

sechkova added the backlog Issues to address with priority for current development goals label Jun 9, 2021

sechkova added this to the weeks24-25 milestone Jun 9, 2021

sechkova self-assigned this Jun 9, 2021

sechkova mentioned this issue Jun 9, 2021

Add hash and length verification to MetaFile and TargetFile #1437

Merged

3 tasks

sechkova mentioned this issue Jun 16, 2021

api/metadata input validation: length and hashes #1451

Merged

3 tasks

jku closed this as completed in #1451 Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api/metadata input validation: hashes #1441

api/metadata input validation: hashes #1441

sechkova commented Jun 9, 2021

sechkova commented Jun 9, 2021 •

edited

Loading

jku commented Jun 9, 2021 •

edited

Loading

jku commented Jun 9, 2021

MVrachev commented Jun 10, 2021 •

edited

Loading

jku commented Jun 11, 2021

mnm678 commented Jun 11, 2021

jku commented Jun 14, 2021 •

edited

Loading

sechkova commented Jun 16, 2021

api/metadata input validation: hashes #1441

api/metadata input validation: hashes #1441

Comments

sechkova commented Jun 9, 2021

sechkova commented Jun 9, 2021 • edited Loading

jku commented Jun 9, 2021 • edited Loading

jku commented Jun 9, 2021

MVrachev commented Jun 10, 2021 • edited Loading

jku commented Jun 11, 2021

mnm678 commented Jun 11, 2021

jku commented Jun 14, 2021 • edited Loading

sechkova commented Jun 16, 2021

sechkova commented Jun 9, 2021 •

edited

Loading

jku commented Jun 9, 2021 •

edited

Loading

MVrachev commented Jun 10, 2021 •

edited

Loading

jku commented Jun 14, 2021 •

edited

Loading