Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "validate" argument option to JSONSerializer #1775

Merged

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented Jan 17, 2022

Fixes #1696, #1788

Description of the changes being introduced by the pull request:

We can say we are almost done with Metadata API validation during the initialization of Signed objects as summarized here: #1140 (comment).

What we didn't focus on is validation when serializing the Metadata objects by calling Metadata.to_dict().

That option can be useful for package managers who import "TUF" into their lifecycle by giving them an assurance that when there are multiple changes of a metadata object then serializing this object to a file and then back to an instance from a file is possible.
I have added a separate API call Metadata.validate() that could be called to give this assurance.

This function can also be used by passing a validate argument when creating a JsonSerializer.
By default, this argument is False.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@coveralls
Copy link

coveralls commented Jan 17, 2022

Pull Request Test Coverage Report for Build 1910286352

  • 56 of 58 (96.55%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 98.311%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tuf/api/metadata.py 47 48 97.92%
tuf/api/serialization/json.py 9 10 90.0%
Totals Coverage Status
Change from base Build 1909502818: -0.04%
Covered Lines: 1178
Relevant Lines: 1194

💛 - Coveralls

@jku
Copy link
Member

jku commented Jan 18, 2022

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

Is there a SSLIB bug to implement eq for signature?

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

@lukpueh
Copy link
Member

lukpueh commented Jan 18, 2022

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

Exactly. The validate implementation is too JSONDe/Serializer-specific for the Metadata class. Plus Jussi's 2x-to_dict argument.

Is there a SSLIB bug to implement eq for signature?

Does not look like it. @MVrachev, would you mind creating a ticket and referencing it in a TODO comment in Metadata.__eq__()?

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

Agreed.

tuf/api/metadata.py Outdated Show resolved Hide resolved
@MVrachev MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from 5d5a0b8 to 16e46c1 Compare January 18, 2022 13:48
@MVrachev
Copy link
Collaborator Author

I rebased and updated this pr

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

I removed Metadata.validate() and moved it into JSONSerializer.serialize().

Is there a SSLIB bug to implement eq for signature?

I proposed a pr with that: secure-systems-lab/securesystemslib#383

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

I added a simple test for that and changed one of the places where we instantiate a JSONSerializer to use validation.
Many of the places where we call to_bytes we call it when we want to raise another specific error and It didn't make sense to use JSONSerializer with validate = True.
I also don't think it makes sense to comprehensively test this part with all of the classes and their attributes as essentially the when calling JSONSerializer.serialize with validation turned on we are relying on the classes initialization validation which already is tested extensively in test_trusted_metadata_set.py.

@MVrachev MVrachev changed the title Add a "validate" API call Add a "validate" argument option to JSONSerializer Jan 18, 2022
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, @MVrachev! Please address my comments and we should be able to soon merge.

Regarding tests, I do think it would be good to add a few to test_metadata_serialization.py that call serialize with self.validate being True and a metadata_ob that triggers the different validation errors.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Show resolved Hide resolved
tuf/api/serialization/json.py Outdated Show resolved Hide resolved
tuf/api/serialization/json.py Outdated Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
@lukpueh
Copy link
Member

lukpueh commented Jan 20, 2022

I just merged #1783. Please make sure that we don't run into dict order problems you kindly pointed out in e3b267e!

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure looks good to me. I left a few more comments in source.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/serialization/json.py Outdated Show resolved Hide resolved
tuf/api/serialization/json.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

It's good to solve issue #1788 inside this pr as well.

@MVrachev MVrachev force-pushed the validation-during-serialization branch from 16e46c1 to dde1290 Compare February 7, 2022 11:19
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 7, 2022

I rebased this pr on top of the latest develop changes and made the following changes:

  1. As suggested by Jussi removed unrecognized_fields check inside all __eq__ implementation in Signed class derivates as it's checked in Signed.__eq__
  2. Inside Metadata.__eq__ I realized there is no sense in testing if self.signatures is None or other.signatures is None or if both of them are None.
    All of those checks can be replaced by type(self.signatures) == type(other.signatures) == dict.
  3. I rewrote the tests. I created a separate test_metadata_eq_ file where similarly to test_metadata_serialization.py I used table testing to write mostly simple tests. I checked if all cases are covered with local coverall runs.
  4. I added made sure that the dictionaries order is taken into account when comparing signatures or delegated roles (see Dict comparisons insensitive to order #1788)
  5. Made sure only SerializationError will be thrown by JSONSerializer.serialize()
  6. Changed the documentation regarding the validate argument.

I know the changes look huge, but that's mostly from test_metadata_eq which should be simple to read.

@MVrachev MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from 2a39784 to fb713d8 Compare February 7, 2022 11:34
@MVrachev MVrachev requested review from lukpueh and jku February 7, 2022 11:36
@MVrachev MVrachev force-pushed the validation-during-serialization branch 3 times, most recently from 7ea6946 to 91f6981 Compare February 7, 2022 11:44
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 7, 2022

I had to rebase a couple of times because GitHub showed changes from previous commits.

@MVrachev MVrachev force-pushed the validation-during-serialization branch from 91f6981 to dbc4bb6 Compare February 7, 2022 11:50
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my earlier comments and adding all these tests. I noticed a few more things. Please address and ping me for another round...

tests/test_metadata_eq_.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved

# Iterate over self.signatures and other.signatures at the same time
# as the signatures in the file format are ordered.
keyids = zip(self.signatures.keys(), other.signatures.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why zipping two lists you just checked to be equal?

Copy link
Collaborator Author

@MVrachev MVrachev Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

When I zip them I make sure I traverse both of them following their corresponding insertion order.
If the self.signatures and other.signatures orders are different, then when I check (a couple of lines below) self_keyid != other_keyid will return False.

I tried without zipping, but it fails the test test_md_eq_signatures_reversed_order inside test_metadata_eq_.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

but doesn't that check actually tell us that the order is correct: List eq does check the order?

Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures since now you know the order is the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, good point, because keys() doesn't return a list anymore in Python3 but rather something set-like and thus unordered.

But still, you're comparing keyids twice here. You could instead only check the lengths at first, i.e. len(self.signatures.keys()) != len(other.signatures.keys()), or just pass strict=True to zip, which does the same thing.

Copy link
Member

@lukpueh lukpueh Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just check self.signatures != other.signatures

Not until secure-systems-lab/securesystemslib#383 is merged and released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just check self.signatures != other.signatures

Not until secure-systems-lab/securesystemslib#383 is merged and released.

... maybe we should just go ahead and do that, before spending more thoughts on zip :D

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/serialization/json.py Outdated Show resolved Hide resolved
@MVrachev MVrachev force-pushed the validation-during-serialization branch from dbc4bb6 to f9344f8 Compare February 8, 2022 12:28
tuf/api/metadata.py Outdated Show resolved Hide resolved

# Iterate over self.signatures and other.signatures at the same time
# as the signatures in the file format are ordered.
keyids = zip(self.signatures.keys(), other.signatures.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

but doesn't that check actually tell us that the order is correct: List eq does check the order?

Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures since now you know the order is the same

tuf/api/serialization/json.py Outdated Show resolved Hide resolved
@MVrachev MVrachev closed this Feb 8, 2022
@MVrachev MVrachev deleted the validation-during-serialization branch February 8, 2022 14:20
@MVrachev MVrachev restored the validation-during-serialization branch February 8, 2022 14:21
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 8, 2022

I closed it by mistake.

@MVrachev MVrachev reopened this Feb 8, 2022
@MVrachev MVrachev force-pushed the validation-during-serialization branch from f9344f8 to b4a6c7e Compare February 14, 2022 13:09
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 14, 2022

I updated the pr after unrecognized fields support was added in securesystemslib version 0.22.0
The changes include:

  1. I added Metadata.unrecognized_fields inside Metadata._eq_
  2. I changed the way we check the dictionaries order of Metadata.signatures and Delegations.roles
  3. I changed how we actually do validation inside JSONSerializer.serialize() by using json_bytes and calling JSONDeserialize.deserialize() to create a new metadata object for comparison.

I think I addressed all comments by @jku and @lukpueh. It's ready for another round of reviews.

@MVrachev MVrachev requested review from lukpueh and jku February 14, 2022 13:12
tuf/api/metadata.py Outdated Show resolved Hide resolved
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts here, @MVrachev! It's getting there. I have a few more comments. This time I also took a closer look at the new test module.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/serialization/json.py Show resolved Hide resolved
tests/test_metadata_eq_.py Show resolved Hide resolved
self.assertEqual(md, md_2)

setattr(md_2, self.case_name, test_case_data)
self.assertNotEqual(md, md_2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you use this pattern a lot in this module (everywhere where you use the run_sub_tests_with_dataset decorator):

for $DATA in $DATA_SET
  1. load container
  2. assert container not equal to empty string
  3. copy container and assert equal
  4. patch container copy with $DATA and assert unequal to original

I think this gives us good test coverage, but maybe you can restructure the pattern a bit? More specifically,
steps 1-3 are don't really need to be repeated for every $DATA in $DATA_SET, when only step 4 varies.

I think it should look something like this:

1. load container
2. assert container not equal to empty string
3. copy container and assert equal
4. for $DATA in $DATA_SET
    patch container copy with $DATA and assert unequal to original

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies in other tests that use that decorator below.

Copy link
Collaborator Author

@MVrachev MVrachev Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we are redoing steps 1-3 maany times and that it's probably to optimize them and I do agree with the second pseudocode and it should work.
There are two disadvantages to opting-out from the decorator:

  1. we don't receive an exact message which of the $DATA caused the effect as we do by using the decorator and a subtest
  2. If one of the $DATA_SET cases fails then the other will do as well.

In a discussion with @lukpueh we remembered that the asserts provided msg option meaning that if an assert fails then the custom message will be appended. This will resolve 1 as a problem. For example, if we want to add Failed case: {case} to the message the result message of a failing equal looks like this:
<tuf.api.metadata.Metadata object at 0x7fdf6db05f00> != 'bbaba' : Failed case: signed
meaning we still see a log of the given argument.

So, the only downside to using a for loop here is 2 which I think is okay given that if for one of the attributes the test fails most likely it will fail for the others as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still use the decorator. I think it makes the tests even cleaner (despite the repetitive first item in the data set rows). Here's an example how I aggregated the common parts of your test_metadata_eq_ and test_signed_eq_. You should be able to use these for most of your other test_*_eq functions as well.

   md_data: utils.DataSet = {
        "snapshot signed":       snapshot,           # test_metadata_eq_ (copy_and_simple_assert)
        "snapshot spec_version": snapshot["signed"], # test_signed_eq_ (copy_and_simple_assert)
        # add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ...
        }

    @utils.run_sub_tests_with_dataset(md_data)
    def test_metadata__eq__(self, md: Any) -> None:
        self.assertNotEqual(md, "")
        md_2 = copy.deepcopy(md)
        self.assertEqual(obj, md)

    md_attrs_data: utils.DataSet = {
        "snapshot signed":       (snapshot, "signed", None),                     # test_metadata_eq_ (iteration 1)
        "snapshot signatures":   (snapshot, "signatures", None),                 # test_metadata_eq_ (iteration 2)
        "snapshot version":      (snapshot["signed"], "version", -1),            # test_signed_eq_ (iteration 1)
        "snapshot spec_version": (snapshot["signed"], "spec_version", "0.0.0"),  # test_signed_eq_ (iteration 2)
       # add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ...
        }

    @utils.run_sub_tests_with_dataset(md_attrs_data)
    def test_metadata_attrs__eq__(self, test_case_data: Tuple) -> None:
        md, attr, val = test_case_data
        md_2 = copy.deepcopy(md)
        setattr(md_2, attr, val)
        self.assertNotEqual(md, md_2)

NOTE: I didn't include the initialization for the first argument (md), if you want to access what's in cls.metadata, you might need to define the test data sets in setUpClass as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think the way the tests are defined now I think is simpler than what you suggested.
We will have one big dataset with many cases instead of separating the attributes into logically separated datasets.

Additionally, the dataset won't be so beautiful when working Key, Role, DelegatedRole, TargetFile and MetaFile data. In order to reuse it we will have to define it inside setupClass, so the data can be reused.

The only real benefit I see is that the file will be smaller.

What do you think @jku?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have problems figuring out if a specific test is even useful, eq() is such a special case. I don't have a strong opinion on the style

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to say that your original idea of using the decorator was quite good, albeit with some restructuring. Apologies for causing any confusion. I'm really also fine either way. Thanks for your persistent efforts, @MVrachev!

tests/test_metadata_eq_.py Show resolved Hide resolved

self.assertEqual(md, md_2)

def test_md_eq_special_signatures_tests(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can trust securesystemslib to properly implement and test Signatures.__eq__. So I suggest to remove this test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree with this one.
Here we are testing the following cases:

  1. Test that metadata objects with different signatures are not equal.
  2. Test that metadata objects with empty signatures are equal
  3. Metadata objects with different signatures types are not equal.

which doesn't test the same thing as https://github.com/secure-systems-lab/securesystemslib/blob/075043e46b1d017fc332cf44a2038558b3e246d8/tests/test_signer.py#L75,

tests/test_metadata_eq_.py Show resolved Hide resolved
@MVrachev MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from d644836 to 9329a4e Compare February 17, 2022 19:21
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 17, 2022

I rebased and made the following major changes:

  1. simplified the tests a lot by following the @lukpueh suggestion and using a for loop instead of the decorator and adding a custom message to gather information when there is an error from which attribute is coming.
  2. simplified the tests related to order of signatures and delegated roles by using reverse (but because python3.7 didn't have reverse implemented for dictionaries I had to cast to a list first).
  3. simplified the verification that objects with the same signatures, but in a different order are different. The same simplification was done for delegated role.

Thanks for the reviews, now it should be in a good shape I hope. :D

@MVrachev MVrachev requested review from lukpueh and jku February 17, 2022 19:25
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • it's a lot of new code but I think it's worth it and the eq() implementations are now simple
  • I admit I am not able to evaluate if the tests are useful: eq() is really tricky. But I trust Martins judgement here: LGTM
  • We only use this feature in one place in tests ATM, I wonder if we should do it more?

Anyway, looks good to me, we can keep improving this after the fact -- e.g. this will be very nice to have for the static tests (#1806). Thanks for your work.

tuf/api/metadata.py Outdated Show resolved Hide resolved
@jku
Copy link
Member

jku commented Feb 25, 2022

We only use this feature in one place in tests ATM, I wonder if we should do it more?

If you have looked at the potential places (any Metadata.to_files/to_bytes() calls) already, and decided not to use validate please leave a comment so we know.

By adding __eq__ we can compare that two objects are equal.
That will be useful when adding validation API call.

One bug I have found during testing is that I don't check if the type
of "other" in the __eq__ implementations are the expected ones.
I assumed that when comparing "root == obj" if "obj" is None that
automatically the result will be false.
Later after a mypy warning, I realized we should implement the __eq__
methods to accept "Any" type as other and we should check manually
that "other" is the expected type.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Test the "__eq__" implementation for all classes defined in
tuf/api/metadata.py
The tests are many but simple. The idea is to test each of the metadata
classes one by one and with this to make sure there are no possible
cases missed.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
If the "validation" argument is set then when
serializing the metadata object will be validated.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
After we have dropped OrderedDict in theupdateframework@e3b267e
we are relying on python3.7+ default behavior to preserve the insertion
order, but there is one caveat.
When comparing dictionaries the order is still irrelevant compared to
OrderedDict. For example:
>>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)])
False
>>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)])
True

There are two special attributes, defined in the specification, where
the order makes a difference when comparing two objects:
- Metadata.signatures
- Targets.delegations.roles.
We want to make sure that the order in those two cases makes a
difference when comparing two objects and that's why those changes
are required inside two __eq__ implementations.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev MVrachev force-pushed the validation-during-serialization branch from 9329a4e to 6ea5372 Compare February 28, 2022 12:42
@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 28, 2022

  • it's a lot of new code but I think it's worth it and the eq() implementations are now simple
  • I admit I am not able to evaluate if the tests are useful: eq() is really tricky. But I trust Martins judgement here: LGTM

When I was writing the tests I wanted to make sure we are:

  1. calling each of the classes with its corresponding __eq__ implementation
  2. comparing an object to a different object (in our case with an empty string)
  3. changing each of the attributes for each of the classes and then calling __eq__.
    That way we check each of the == comparisons in each of the __eq__ functions.

While writing the tests I used coverall to check which lines in the __eq__ implementations were tested.

  • We only use this feature in one place in tests ATM, I wonder if we should do it more?

Anyway, looks good to me, we can keep improving this after the fact -- e.g. this will be very nice to have for the static tests (#1806). Thanks for your work.

If you have looked at the potential places (any Metadata.to_files/to_bytes() calls) already, and decided not to use validate please leave a comment so we know.

Well, we don't actually use validation in one test, but in multiple, because it's in the helper function modify_metadata inside tests/test_trusted_metadata_set.py.
I didn't want to include this feature in multiple tests as I thought it can slow them down.
Let's keep it that way for now and if we decide we can always change that.

@jku jku merged commit a74f7a1 into theupdateframework:develop Feb 28, 2022
@MVrachev MVrachev deleted the validation-during-serialization branch March 1, 2022 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata API: Provide a way to validate object when serializing to dictionary
4 participants