Add hash and length verification to MetaFile and TargetFile #1437

sechkova · 2021-06-04T15:47:47Z

Please fill in the fields below to submit a pull request. The more information
that is provided, the better.

Description of the changes being introduced by the pull request:
MetaFile and TargetFile need exactly the same code for length and hash verification but with different exceptions
for the user API.

Add a HashVerificationMixIn class implementing the common functionality.
Add public methods to MetaFile and TargetFile

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

jku

I was left wondering about two things:

do we actually want to raise on missing length/hashes? and related, do we need separate checks at all? I'm asking because it seems that the repository decides whether there are lengths/hashes in the metadata or not, and (assuming the spec doesn't demand them in this situation) it seems to me the client shouldn't really do anything about missing length/hashes. So if client shouldn't do anything, why do we raise?

Sure, I can imagine someone implementing TUF in a way that requires hashes for metadata files... but that's not what the spec says so I don't see why the metadata API should implement that case.

Second thing is, I wonder if you experimented with a common parent class instead of a Mixin? It would probably work better with type annotations (defining a class variable of same name seems like a hack that has some runtime effects as well, I think the proper way to do Mixins with type annotations is Protocols: https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes, but no idea if they're worth the trouble)

jku · 2021-06-05T08:12:00Z

tuf/api/metadata.py

+            data.seek(0, io.SEEK_END)
+            observed_length = data.tell()
+
+        if self.length != observed_length:


this uses self.length but it's marked Optional (see other comments about related things though)... not a bug I guess but a bit weird

jku · 2021-06-05T08:12:52Z

tuf/api/metadata.py

+        if self.hashes is None:
+            raise ValueError("Missing expected hashes")


_verify_hashes() does this already

I had to add one of those checks to silence mypy .. but I was planning to revise them :)

jku · 2021-06-05T17:49:09Z

tuf/exceptions.py

+# TODO: LengthMismatchError fully duplicates DownloadLengthMismatchError
+# except that it is not derived from Download.  Revise and remove duplicates
+# with the progress of the client refactor.
+class LengthMismatchError(Error):


I see why you're doing this but let's see if the other questions change the situation at all, maybe we can figure out something else... If I call verify_length_and_hashes() I don't think I would ever want to handle the two errors separately: that implies this maybe should be just a single error.

Let's at least not add the functions: I have never seen the data in TUF custom errors being used for anything, I don't expect anyone to use these either -- so a plain message should be fine

tuf/api/metadata.py

sechkova · 2021-06-07T12:52:43Z

do we actually want to raise on missing length/hashes? and related, do we need separate checks at all? I'm asking because it seems that the repository decides whether there are lengths/hashes in the metadata or not, and (assuming the spec doesn't demand them in this situation) it seems to me the client shouldn't really do anything about missing length/hashes. So if client shouldn't do anything, why do we raise?

Sounds right that the repository makes the decision and the client cannot do anything about it. Based on this I think an API like this should look neater and it'd be easier to use:

BaseClass or MixIn:
    _verify_hashes(self, data: Union[bytes, BinaryIO]):
         ....
         raise BadHashError # should we define a new error covering both

     _verify_length(self, data: Union[bytes, BinaryIO]):
          ....
         raise ? # should we define a new error 
       
TargetFile:
def verify_length_and_hashes(self, data: Union[bytes, BinaryIO]):
      self._verify_hashes(data)
      self._verify_length(data)


MetaFile:
def verify_length_and_hashes(self, data: Union[bytes, BinaryIO]):
       if self.length is not None:
           self._verify_length(data)
       if self.hashes is not None:
           self._verify_hashes(data)

sechkova · 2021-06-07T12:57:51Z

Second thing is, I wonder if you experimented with a common parent class instead of a Mixin? It would probably work better with type annotations (defining a class variable of same name seems like a hack that has some runtime effects as well, I think the proper way to do Mixins with type annotations is Protocols: https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes, but no idea if they're worth the trouble)

I agree that I started including too many "hackish" lines of code because of type checking and the mixin. I guess I have to define the BaseClass as an ABC to help mypy? I will try and see if the members being optional in only one of the classes causes troubles again.

jku · 2021-06-08T07:16:30Z

I agree that I started including too many "hackish" lines of code because of type checking and the mixin. I guess I have to define the BaseClass as an ABC to help mypy? I will try and see if the members being optional in only one of the classes causes troubles again.

On the other hand, I'm sure the Python annotation story still has some holes in it (we're definitely seeing them in python < 3.7): so if annotation is preventing a good design, let's also consider disabling annotation for that bit...

But in this case I think the baseclass would probably work fine. Some options:

method in baseclass that checks self.length/self.hashes if they are non-None: this way the child classes don't even need to define methods of their own (but TargetFile must make sure it never exists without length/hashes)
class method(s) in baseclass that take the length and hashes as argument: then the baseclass is just a way to provide an encapsulated method and does not require any data to exist in child instance

sechkova · 2021-06-08T15:41:05Z

Some options:

* method in baseclass that checks self.length/self.hashes if they are non-None: this way the child classes don't even need to define methods of their own (but TargetFile must make sure it never exists without length/hashes)

I hesitated because the first option reduces duplicated code shared between the two classes (MetaFile, TargetFile). On the other hand optional parameters lead to slight differences between the two. Silently ignoring missing length and hashes when calling the base class function from a TargetFile object just doesn't feel right (even if we made sure that this should never happen).

Same for declaring length/hashes in the base class as optional. It feels misleading for TargetFile.

* class method(s) in baseclass that take the length and hashes as argument: then the baseclass is just a way to provide an encapsulated method and does not require any data to exist in child instance

I think I will try this one which is also close to my initial idea but now the verification methods can be just class/static methods of the base class.

sechkova · 2021-06-08T17:24:45Z

I believe I took into account all comments from the discussion and it should be ready for a review now.

jku

I like it. I think you could have squeezed the exceptions to a line shorter but that's so minor I'm not asking to change that.

So the only real question is about validation: since you made the mistake of adding some, I'm of course asking for more... I'm fine with filing an issue on that though, and getting this merged without solving hashes validation.

jku · 2021-06-09T07:43:46Z

tuf/api/metadata.py

+        # Do some basic input validation
+        if version <= 0:
+            raise ValueError(f"Metafile version must be > 0, got {version}")
+        if length is not None and length <= 0:
+            raise ValueError(f"Metafile length must be > 0, got {length}")


I guess this is where we get to decide what to do with input data that we don't have a "natural" check for: length and version get validated "enough" like this IMO but hashes could still be anything...

For the values we could run

if not all(isinstance(h, str) for h in hashes.values()): raise ValueError("Invalid hash value")

But maybe we don't need to? Is it enough that the hash verification will fail when it's tried on completely bogus hash value?

For the dict keys: Can we make SSLib check that the algorithms are known to SSLib?

Yeah ... I avoided intentionally the downward validation spiral here ...

Hash values: there is even a regex about correct hash value in formats.py now but I wasn't sure if it isn't an overkill.
Dict keys: Probably we can pre-define somewhere supported algorithms by sslib because now the check is done runtime during hash calculation.

I will file an issue about it, especially hash values.

Here it is: #1441

tuf/api/metadata.py

jku · 2021-06-11T06:05:10Z

there's a conflict here with my changes to verify_signature()... In particular I ended making the import from securesystemslib import keys as sslib_keys.
I don't have strong opinions about this (and think there's probably no "right way" to import) so either way works, but I felt that importing individual functions is less useful: it's harder in the code to notice where actual library calls are made.

Otherwise LGTM

Extend MetaFile and TargetFile classes with methods for length and hash verification. The common functionality is implemented as static methods of the base class while MetaFile and TargetFile implement the user API based on it. Define LengthOrHasheMismathError. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add tests for hash and length verification. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add basic checks for allowed input values during objects' serialization. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova · 2021-06-11T09:03:44Z

Rebased on develop and changed the import to sslib_hashes to match sslib_keys.

tuf/api/metadata.py

jku · 2021-06-16T12:46:12Z

So I approved this already but I'm now thinking if we should handle the unsupported algorithm and format error cases in this PR as well... I guess a separate issue works too?

Basically as discussed in #1438 and #1441: I don't think we should try to deeply validate e.g. our hash algorithms at deserialize-time. This means when they are used (in this PR), we need to handle every error, and turn the errors into the hash mismatch errors

jku · 2021-06-16T19:17:54Z

I'm making an executive decision and merging, work continues in:

api/metadata input validation: length and hashes #1451 validation
MetaFile/TargetFile: ensure securesystemslib errors are all handled #1452 exception handling

jku reviewed Jun 7, 2021

View reviewed changes

sechkova force-pushed the hash-verification branch 2 times, most recently from befbab5 to 587b12f Compare June 8, 2021 17:19

sechkova marked this pull request as ready for review June 8, 2021 17:23

jku reviewed Jun 9, 2021

View reviewed changes

jku mentioned this pull request Jun 10, 2021

Merge ngclient: a new client library implementation #1408

Merged

sechkova added 3 commits June 11, 2021 11:52

Update test_api

22facb7

Add tests for hash and length verification. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add basic input validation to {Meta,Target}File

dcdd332

Add basic checks for allowed input values during objects' serialization. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova force-pushed the hash-verification branch from 587b12f to dcdd332 Compare June 11, 2021 09:01

jku reviewed Jun 11, 2021

View reviewed changes

tuf/api/metadata.py Show resolved Hide resolved

jku approved these changes Jun 11, 2021

View reviewed changes

sechkova mentioned this pull request Jun 16, 2021

api/metadata input validation: length and hashes #1451

Merged

3 tasks

jku mentioned this pull request Jun 16, 2021

MetaFile/TargetFile: ensure securesystemslib errors are all handled #1452

Closed

jku merged commit 39ed706 into theupdateframework:develop Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hash and length verification to MetaFile and TargetFile #1437

Add hash and length verification to MetaFile and TargetFile #1437

sechkova commented Jun 4, 2021

jku left a comment

jku Jun 5, 2021 •

edited

Loading

jku Jun 5, 2021

sechkova Jun 7, 2021

jku Jun 5, 2021

sechkova commented Jun 7, 2021 •

edited

Loading

sechkova commented Jun 7, 2021

jku commented Jun 8, 2021

sechkova commented Jun 8, 2021

sechkova commented Jun 8, 2021

jku left a comment

jku Jun 9, 2021 •

edited

Loading

sechkova Jun 9, 2021

sechkova Jun 9, 2021

jku commented Jun 11, 2021 •

edited

Loading

sechkova commented Jun 11, 2021

jku commented Jun 16, 2021 •

edited

Loading

jku commented Jun 16, 2021

		if self.hashes is None:
		raise ValueError("Missing expected hashes")

Add hash and length verification to MetaFile and TargetFile #1437

Add hash and length verification to MetaFile and TargetFile #1437

Conversation

sechkova commented Jun 4, 2021

jku left a comment

Choose a reason for hiding this comment

jku Jun 5, 2021 • edited Loading

Choose a reason for hiding this comment

jku Jun 5, 2021

Choose a reason for hiding this comment

sechkova Jun 7, 2021

Choose a reason for hiding this comment

jku Jun 5, 2021

Choose a reason for hiding this comment

sechkova commented Jun 7, 2021 • edited Loading

sechkova commented Jun 7, 2021

jku commented Jun 8, 2021

sechkova commented Jun 8, 2021

sechkova commented Jun 8, 2021

jku left a comment

Choose a reason for hiding this comment

jku Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

sechkova Jun 9, 2021

Choose a reason for hiding this comment

sechkova Jun 9, 2021

Choose a reason for hiding this comment

jku commented Jun 11, 2021 • edited Loading

sechkova commented Jun 11, 2021

jku commented Jun 16, 2021 • edited Loading

jku commented Jun 16, 2021

jku Jun 5, 2021 •

edited

Loading

sechkova commented Jun 7, 2021 •

edited

Loading

jku Jun 9, 2021 •

edited

Loading

jku commented Jun 11, 2021 •

edited

Loading

jku commented Jun 16, 2021 •

edited

Loading