Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hash and length verification to MetaFile and TargetFile #1437

Merged
merged 3 commits into from
Jun 16, 2021

Conversation

sechkova
Copy link
Contributor

@sechkova sechkova commented Jun 4, 2021

Please fill in the fields below to submit a pull request. The more information
that is provided, the better.

Fixes #1361

Description of the changes being introduced by the pull request:
MetaFile and TargetFile need exactly the same code for length and hash verification but with different exceptions
for the user API.

  • Add a HashVerificationMixIn class implementing the common functionality.
  • Add public methods to MetaFile and TargetFile

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was left wondering about two things:

do we actually want to raise on missing length/hashes? and related, do we need separate checks at all? I'm asking because it seems that the repository decides whether there are lengths/hashes in the metadata or not, and (assuming the spec doesn't demand them in this situation) it seems to me the client shouldn't really do anything about missing length/hashes. So if client shouldn't do anything, why do we raise?

Sure, I can imagine someone implementing TUF in a way that requires hashes for metadata files... but that's not what the spec says so I don't see why the metadata API should implement that case.

Second thing is, I wonder if you experimented with a common parent class instead of a Mixin? It would probably work better with type annotations (defining a class variable of same name seems like a hack that has some runtime effects as well, I think the proper way to do Mixins with type annotations is Protocols: https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes, but no idea if they're worth the trouble)

data.seek(0, io.SEEK_END)
observed_length = data.tell()

if self.length != observed_length:
Copy link
Member

@jku jku Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses self.length but it's marked Optional (see other comments about related things though)... not a bug I guess but a bit weird

Comment on lines 760 to 761
if self.hashes is None:
raise ValueError("Missing expected hashes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_verify_hashes() does this already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add one of those checks to silence mypy .. but I was planning to revise them :)

# TODO: LengthMismatchError fully duplicates DownloadLengthMismatchError
# except that it is not derived from Download. Revise and remove duplicates
# with the progress of the client refactor.
class LengthMismatchError(Error):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you're doing this but let's see if the other questions change the situation at all, maybe we can figure out something else... If I call verify_length_and_hashes() I don't think I would ever want to handle the two errors separately: that implies this maybe should be just a single error.

Let's at least not add the functions: I have never seen the data in TUF custom errors being used for anything, I don't expect anyone to use these either -- so a plain message should be fine

tuf/api/metadata.py Outdated Show resolved Hide resolved
@sechkova
Copy link
Contributor Author

sechkova commented Jun 7, 2021

do we actually want to raise on missing length/hashes? and related, do we need separate checks at all? I'm asking because it seems that the repository decides whether there are lengths/hashes in the metadata or not, and (assuming the spec doesn't demand them in this situation) it seems to me the client shouldn't really do anything about missing length/hashes. So if client shouldn't do anything, why do we raise?

Sounds right that the repository makes the decision and the client cannot do anything about it. Based on this I think an API like this should look neater and it'd be easier to use:

BaseClass or MixIn:
    _verify_hashes(self, data: Union[bytes, BinaryIO]):
         ....
         raise BadHashError # should we define a new error covering both

     _verify_length(self, data: Union[bytes, BinaryIO]):
          ....
         raise ? # should we define a new error 
       
TargetFile:
def verify_length_and_hashes(self, data: Union[bytes, BinaryIO]):
      self._verify_hashes(data)
      self._verify_length(data)


MetaFile:
def verify_length_and_hashes(self, data: Union[bytes, BinaryIO]):
       if self.length is not None:
           self._verify_length(data)
       if self.hashes is not None:
           self._verify_hashes(data)

@sechkova
Copy link
Contributor Author

sechkova commented Jun 7, 2021

Second thing is, I wonder if you experimented with a common parent class instead of a Mixin? It would probably work better with type annotations (defining a class variable of same name seems like a hack that has some runtime effects as well, I think the proper way to do Mixins with type annotations is Protocols: https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes, but no idea if they're worth the trouble)

I agree that I started including too many "hackish" lines of code because of type checking and the mixin. I guess I have to define the BaseClass as an ABC to help mypy? I will try and see if the members being optional in only one of the classes causes troubles again.

@jku
Copy link
Member

jku commented Jun 8, 2021

I agree that I started including too many "hackish" lines of code because of type checking and the mixin. I guess I have to define the BaseClass as an ABC to help mypy? I will try and see if the members being optional in only one of the classes causes troubles again.

On the other hand, I'm sure the Python annotation story still has some holes in it (we're definitely seeing them in python < 3.7): so if annotation is preventing a good design, let's also consider disabling annotation for that bit...

But in this case I think the baseclass would probably work fine. Some options:

  • method in baseclass that checks self.length/self.hashes if they are non-None: this way the child classes don't even need to define methods of their own (but TargetFile must make sure it never exists without length/hashes)
  • class method(s) in baseclass that take the length and hashes as argument: then the baseclass is just a way to provide an encapsulated method and does not require any data to exist in child instance

@sechkova
Copy link
Contributor Author

sechkova commented Jun 8, 2021

Some options:

* method in baseclass that checks self.length/self.hashes if they are non-None: this way the child classes don't even need to define methods of their own (but TargetFile must make sure it never exists without length/hashes)

I hesitated because the first option reduces duplicated code shared between the two classes (MetaFile, TargetFile). On the other hand optional parameters lead to slight differences between the two. Silently ignoring missing length and hashes when calling the base class function from a TargetFile object just doesn't feel right (even if we made sure that this should never happen).

Same for declaring length/hashes in the base class as optional. It feels misleading for TargetFile.

* class method(s) in baseclass that take the length and hashes as argument: then the baseclass is just a way to provide an encapsulated method and does not require any data to exist in child instance

I think I will try this one which is also close to my initial idea but now the verification methods can be just class/static methods of the base class.

@sechkova sechkova force-pushed the hash-verification branch 2 times, most recently from befbab5 to 587b12f Compare June 8, 2021 17:19
@sechkova sechkova marked this pull request as ready for review June 8, 2021 17:23
@sechkova
Copy link
Contributor Author

sechkova commented Jun 8, 2021

I believe I took into account all comments from the discussion and it should be ready for a review now.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I think you could have squeezed the exceptions to a line shorter but that's so minor I'm not asking to change that.

So the only real question is about validation: since you made the mistake of adding some, I'm of course asking for more... I'm fine with filing an issue on that though, and getting this merged without solving hashes validation.

Comment on lines +718 to +727
# Do some basic input validation
if version <= 0:
raise ValueError(f"Metafile version must be > 0, got {version}")
if length is not None and length <= 0:
raise ValueError(f"Metafile length must be > 0, got {length}")
Copy link
Member

@jku jku Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is where we get to decide what to do with input data that we don't have a "natural" check for: length and version get validated "enough" like this IMO but hashes could still be anything...

For the values we could run

if not all(isinstance(h, str) for h in hashes.values()):
    raise ValueError("Invalid hash value")

But maybe we don't need to? Is it enough that the hash verification will fail when it's tried on completely bogus hash value?

For the dict keys: Can we make SSLib check that the algorithms are known to SSLib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ... I avoided intentionally the downward validation spiral here ...

Hash values: there is even a regex about correct hash value in formats.py now but I wasn't sure if it isn't an overkill.
Dict keys: Probably we can pre-define somewhere supported algorithms by sslib because now the check is done runtime during hash calculation.

I will file an issue about it, especially hash values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is: #1441

tuf/api/metadata.py Show resolved Hide resolved
@jku
Copy link
Member

jku commented Jun 11, 2021

there's a conflict here with my changes to verify_signature()... In particular I ended making the import from securesystemslib import keys as sslib_keys.
I don't have strong opinions about this (and think there's probably no "right way" to import) so either way works, but I felt that importing individual functions is less useful: it's harder in the code to notice where actual library calls are made.

Otherwise LGTM

Extend MetaFile and TargetFile classes with methods
for length and hash verification.  The common functionality
is implemented as static methods of the base class while
MetaFile and TargetFile implement the user API based on it.

Define LengthOrHasheMismathError.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add tests for hash and length verification.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add basic checks for allowed input values during
objects' serialization.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@sechkova
Copy link
Contributor Author

Rebased on develop and changed the import to sslib_hashes to match sslib_keys.

@jku
Copy link
Member

jku commented Jun 16, 2021

So I approved this already but I'm now thinking if we should handle the unsupported algorithm and format error cases in this PR as well... I guess a separate issue works too?

Basically as discussed in #1438 and #1441: I don't think we should try to deeply validate e.g. our hash algorithms at deserialize-time. This means when they are used (in this PR), we need to handle every error, and turn the errors into the hash mismatch errors

@jku
Copy link
Member

jku commented Jun 16, 2021

I'm making an executive decision and merging, work continues in:

@jku jku merged commit 39ed706 into theupdateframework:develop Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API : Add hash verification for timestamp/snapshot/targets
2 participants