Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API: Comprehensive serialization testing #1416

Merged
merged 1 commit into from
Jun 23, 2021

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented May 25, 2021

Fixes #1391

Description of the changes being introduced by the pull request:
The idea of this pr is to separate (de)serialization testing outside
test_api.py and make sure we are testing from_dict/to_dict for all
possible valid data for all classes.

As a result of this pr, there will be a single place where
one can have a quick look at what use-cases are we testing for
(de)serialization.

Signed-off-by: Martin Vrachev mvrachev@vmware.com

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@jku
Copy link
Member

jku commented May 27, 2021

I decided to leave a longer comment in the issue since it's not really a review. The core points are

  • we should have the ability to scale, to increase coverage without always adding more code
  • it should be easy to understand what exactly gets tested

@MVrachev
Copy link
Collaborator Author

MVrachev commented Jun 3, 2021

I updated the pr to use decorators as proposed by @jku in his comment here #1391 (comment).
While working on it, I wanted to showcase how it will look to hardcode the top-level metadata classes and I ended up changing the whole test module.
As a positive side effect of using decorators, we don't need to create a new temporary directory and copy metadata files to it.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot more!

  • much less code, more data 👍
  • no setup and teardown: lovely
  • the test data for the smaller helper classes looks pretty good (signed classes are the trickiest as expected but even they are close to being readable: we can try to improve that later on)

There are still things I'd like to tweak (we can consider leaving the work for future PRs, but I think this code is going to be a model for a lot of other new test code so it may be worth polishing this one):

  • I really dislike referring to test data of another test in the test data: I want to be able to modify a test cases data without worrying about other test cases using that data -- in other words test dataset + test should be an atomic unit that does not affect other tests, if possible. The common shared data should be minimal an clearly marked as shared (like you do with the class 'constants')
  • we have a line length limit in our style guide: I think this is a great place to admit that the pirate code is more what you'd call "guidelines" than actual rules. If the test data becomes more readable (or you can avoid using variables in the dataset) I think using longer lines is a good choice here
  • the class constants should be chosen carefully: I want to understand what they mean when I read the test case (without seeing the actual constant definition). I'd avoid using ambiguous ones like KEYS and HASHES, and instead use objects (like KEY) or types (like SHA256 string if we need one). SIGNED_COMMON might be an exception to this that can't be avoided
  • class constants should not be used when they can be avoided -- see e.g. using KEY when testing key serialization.

"expires": "2030-01-01T00:00:00Z"'

valid_keys = {
"all": KEY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason not to write the key content here: this is the place where we want to test the key data so all the variants that we want to test should be right here IMO (one case is fine as a start, but it should be defined here).

KEY constant may well be useful in other tests (where we are not specifically interested in testing the key data, we just want something that serializes successfully). The fact that it may be a duplicate of a single case in the key test here is fine in my opinion.

On the actual data for key: I think using bogus data in serialization tests in general is fine (it pretty clearly shows what we are not testing here) but scheme is probably something we are going to validate (?) so that should maybe be real data? We can fix that when the validation gets added though, I don't really care.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely most of the bogus data should be changed to something meaningful.
I will use a real scheme here.

@jku
Copy link
Member

jku commented Jun 8, 2021

test dataset + test should be an atomic unit that does not affect other tests, if possible

Just to continue on this: I even considered suggesting this:

    @run_sub_tests_with_dataset({
        "all": '{"keyids": ["id"], "threshold": 3}'
     })
     def test_role_serialization(self, test_case_data: str):
         # test code

to make it clear that the test data is part of the test, not something others should use -- but this maybe scales badly and moves the decorator name so far from the method name if there are many test cases?

@MVrachev
Copy link
Collaborator Author

MVrachev commented Jun 9, 2021

test dataset + test should be an atomic unit that does not affect other tests, if possible

Just to continue on this: I even considered suggesting this:

    @run_sub_tests_with_dataset({
        "all": '{"keyids": ["id"], "threshold": 3}'
     })
     def test_role_serialization(self, test_case_data: str):
         # test code

to make it clear that the test data is part of the test, not something others should use -- but this maybe scales badly and moves the decorator name so far from the method name if there are many test cases?

The syntax looks necessarily complicated that way.
This will be especially true for the top-level metadata classes requiring bigger strings to be hardcoded.

@MVrachev MVrachev force-pushed the comprehensive-testing branch 3 times, most recently from 9f35e93 to 071622c Compare June 9, 2021 14:21
@MVrachev
Copy link
Collaborator Author

MVrachev commented Jun 9, 2021

I updated this pr and used as few as possible constants and removed usage of other tests valid data.
Also, in some cases, I used more than 80 characters per line to make it readable.

I rebased again to fix a typo in the test file name.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot to like I think:

  • there's very little test specific code
  • The test data for all of the helper classes (MetaFile, Role, TargetFile, Key) looks great I think: expanding those tests will be very easy
  • tests and their data are completely independent: you can modify one test case without worrying it will change what gets tested elsewhere

The only annoying bit for me is:

  • signed-classes test data is still a bit hard to read (because of the combo of multiple lines, constants and the f-string formatting)

I think it's pretty good overall though: this feels like a test set I feel I could start expanding without it becoming unmaintainable. So LGTM after fixing my DataSet stupidity (see code comment).

Would be great to get a comment from someone who's not been looking at this yet: @sechkova maybe: does this need advice on how the test cases work or something?


# DataSet is only here so type hints can be used:
# It is a dict of name to test dict
DataSet = Dict[str, Dict[str, Any]]
Copy link
Member

@jku jku Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, I just suggested using more of this annotation but now that I review the actual content, it doesn't actually make sense...

This type was true in one of my original examples when I still had the test case data as dict... but here (and already in my later examples) the tests case data is a string. In which case having a custom DataSet type isn't useful at all we could just say Dict[str, str] instead, right?

(Would be nice to get mypy running on the test files as well so we didn't have to review these details)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not remove the DataSet type because it saved me from splitting an existing line into two.

I still, updated the definitions for it though.

Copy link
Contributor

@sechkova sechkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the general idea and I like the example in the original issue. What I don't like is that you seem to be diverging from it.

  • the "driver" code for all test sub-cases is hidden in the decorator outside the class
  • meanwhile there is the same test code repeated for each metadata class
  • I understand that the goal was to group the test data and the test case but iterating constants and code through the class does not give me this feeling.

I realize that this review is pure complaining without any good suggestions which should show that I agree, it is not an easy task.

If I try to summarize, the implementation could benefit from some better structure:

  • common tests driver code
  • tests code
  • tests data

"keyval": {"public": "foo"}}',
}

@run_sub_tests_with_dataset(valid_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a very clever decorator which avoids calling subTest each time and yet all functions in the class repeat exactly the same code, except for the different metadata class. It feels imbalanced. It was not worth figuring out how this decorator works to only see repeated code everywhere else.

I understand that Key.from_dict is the only function with a different signature which is very annoying and I couldn't come up with a good alternative yet too.

@jku
Copy link
Member

jku commented Jun 15, 2021

Thanks Teodora, maybe we should still try to improve...

  • the "driver" code for all test sub-cases is hidden in the decorator outside the class

Just to make sure this is not just a misunderstanding because we lack comments: the decorator is intentionally not doing anything that relates to TUF, serialization or even json: the only thing it does is enable the "table testing style" where the test function gets a single test case data at a time from the hashtable of test cases. We could have taken a new test dependency to do this for us but this still felt more light weight.

So a test writer really does not need to understand the decorator further than: "decorator ensures test function will be called once for each of the values in the 'dataset' dictionary": Would a better comment stating this next to the decorator help? Or should we move the decorator to a different file just to make it clear it's not a serialization thing per se?

  • meanwhile there is the same test code repeated for each metadata class

Yeah the three lines obviously could be refactored to a function... That would cut down on lines of code but do you think it would make the tests more readable?

  • I understand that the goal was to group the test data and the test case but iterating constants and code through the class does not give me this feeling.

I hear you. I'm happy with the test data for most of the helper classes but the signed-derivatives are ugly. Some options:

  • could declare failure for Signed-classes and use another solution to test them. I'm not convinced they are better options
  • could just use more space for test case data and avoid using constants: E.g. Delegations would arguably become easier to read with KEY just inlined (as it wouldn't need to be an f-string anymore at all). In fact, with current test data getting rid of the constants completely would mean a max of +10 lines... and probably less in practice
  • could squeeze the test data tighter to account for that by breaking the line length limit even more -- I think this would not be a an issue in this case

If I try to summarize, the implementation could benefit from some better structure:

  • common tests driver code
  • tests code
  • tests data

I firmly believe in this instance each test case should have it's own data: if the data gets bundled with other data then over time the same data gets used in multiple tests... and then no-one touches the test data anymore because who knows what it will break. That said I'm not against a different structure if that issue is kept in mind.

@sechkova
Copy link
Contributor

Since I really don't have better suggestions, maybe simply "beautifying" the "data sets" format will make them look nice and readable.
The decorator already has an explanatory comment but could be:

feeding the actual test function one data item test case at a time

@MVrachev
Copy link
Collaborator Author

I decided to experiment and create commits for both:

  1. when we remove the KEY constants and some of the f-strings
  2. when we remove the last one constant SIGNED_COMMON and remove all of the f-strings.

Which one of the three situations is best:

  • with KEY and SIGNED_COMMON
  • without KEY with SIGNED_COMMON
  • no constants and f-strings?

@jku
Copy link
Member

jku commented Jun 16, 2021

I think I like the last version: sure there's a lot of text but I can just read it and it's literally working json (and not f-strings with "double-curly encoding")...

I had a closer look at the decorator and I think I probably messed up the type annotations in my examples at some point while making a couple of different versions: @MVrachev can you have a look at last commit in https://github.com/jku/tuf/commits/comprehensive-testing and include it if it seems correct to you

@MVrachev
Copy link
Collaborator Author

@sechkova can you have another look and give us your opinion?

@sechkova
Copy link
Contributor

Yes, it looks better formatted, I can't say I strongly like putting data right next to tests instead of a separate "data set class" but I also don't insist on changing it as long as you are happy with this approach.

@MVrachev
Copy link
Collaborator Author

MVrachev commented Jun 21, 2021

I rebased, fixed the merge conflict, and squashed all of the commits together.

Additionally, I noticed I can

  1. remove Metafile class serialization tests
  2. remove the test for from_dict/to_dict Targets with empty targets
  3. remove the test for from_dict/to_dict Targets without delegations

because I already test for them in test_metadata_serialization.py and so I did.

Yes, it looks better formatted, I can't say I strongly like putting data right next to tests instead of a separate "data set class" but I also don't insist on changing it as long as you are happy with this approach.

I understand your point of view, but I prefer the way it's now, so you can see the test and the related dataset together.

@jku you agree with me or you prefer if I move the hardcoded strings to a separate class?
If you are okay the way it is now, can we merge this, or we want to add something more in this pr?

@jku
Copy link
Member

jku commented Jun 21, 2021

@jku you agree with me or you prefer if I move the hardcoded strings to a separate class?

I agree: moving the test data somewhere else removes the connection between the two, and then the data gets used for some other test as well (and I really think each test should have its own data if its practically possible)

If you are okay the way it is now, can we merge this, or we want to add something more in this pr?

From my previous comment:
I had a closer look at the decorator and I think I probably messed up the type annotations in my examples at some point while making a couple of different versions: @MVrachev can you have a look at last commit in https://github.com/jku/tuf/commits/comprehensive-testing and include it if it seems correct to you

It's just two annotation changes, feel free to squash them in if they look right to you. The commit is this jku@a03c8c7

@MVrachev MVrachev force-pushed the comprehensive-testing branch 2 times, most recently from b57b53b to af43ab8 Compare June 22, 2021 09:51
The idea of this commit is to separate (de)serialization testing outside
test_api.py and make sure we are testing from_dict/to_dict for all
possible valid data for all classes.

Jussi in his comment here:
theupdateframework#1391 (comment)
proposed using decorators when creating comprehensive testing
for metadata serialization.
The main problems he pointed out is that:
1) there is a lot of code needed to generate the data for each case
2) the test implementation scales badly when you want to add new
cases for your tests, then you would have to add code as well
3) the dictionary format is not visible - we are loading external files
and assuming they are not changed and valid

In this change, I am using a decorator with an argument that complicates
the implementation of the decorator and requires three nested functions,
but the advantages are that we are resolving the above three problems:
1) we don't need new code when adding a new test case
2) a small amount of hardcoded data is required for each new test
3) the dictionaries are all in the test module without the need of
creating new directories and copying data.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev
Copy link
Collaborator Author

@jku you agree with me or you prefer if I move the hardcoded strings to a separate class?

I agree: moving the test data somewhere else removes the connection between the two, and then the data gets used for some other test as well (and I really think each test should have its own data if its practically possible)

If you are okay the way it is now, can we merge this, or we want to add something more in this pr?

From my previous comment:
I had a closer look at the decorator and I think I probably messed up the type annotations in my examples at some point while making a couple of different versions: @MVrachev can you have a look at last commit in https://github.com/jku/tuf/commits/comprehensive-testing and include it if it seems correct to you

It's just two annotation changes, feel free to squash them in if they look right to you. The commit is this jku@a03c8c7

Sorry, I didn't notice that comment.
Updated the annotations and removed two typing imports for Any and Type,
I think it should be ready for merging now.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's go with this and see what it looks like after we start adding more test cases and new tests.

MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 22, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@jku jku merged commit 97da5ab into theupdateframework:develop Jun 23, 2021
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 23, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 23, 2021
We have merged ADR 8 allowing for unrecognized fields and we have
added tests for that which are too specific and not scalable.

Now, I use table testing which we have used initially in theupdateframework#1416
to test unrecognized fields support in a cleaner and much more readable
way.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 30, 2021
We have merged ADR 8 allowing for unrecognized fields and we have
added tests for that which are too specific and not scalable.

Now, I use table testing which we have used initially in theupdateframework#1416
to test unrecognized fields support in a cleaner and much more readable
way.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 30, 2021
We have merged ADR 8 allowing for unrecognized fields and we have
added tests for that which are too specific and not scalable.

Now, I use table testing which we have used initially in theupdateframework#1416
to test unrecognized fields support in a cleaner and much more readable
way.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 30, 2021
We have merged ADR 8 allowing for unrecognized fields and we have
added tests for that which are too specific and not scalable.

Now, I use table testing which we have used initially in theupdateframework#1416
to test unrecognized fields support in a cleaner and much more readable
way.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 1, 2021
We have merged ADR 8 allowing for unrecognized fields and we have
added tests for that which are too specific and not scalable.

Now, I use table testing which we have used initially in theupdateframework#1416
to test unrecognized fields support in a cleaner and much more readable
way.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 5, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 6, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 7, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev MVrachev deleted the comprehensive-testing branch July 19, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata API: comprehensive de/serialization testing
3 participants