Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-107862: Add property-based round-trip tests for base64 #119406

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

encukou
Copy link
Member

@encukou encukou commented May 22, 2024

Add round-trip tests for the various encodings in the base64 module.
These are rather stable, so I expect the tests will serve more as examples
of how to use the Hypothesis API.

Also, fix an issue in the @example decorator stub: it didn't work with multiple
positional arguments.

cc @Zac-HD

Copy link
Contributor

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We don't need the @composite decorator in this case, but otherwise lovely idiomatic use of Hypothesis.

Comment on lines 25 to 26
return bytes(draw(hypothesis.strategies.lists(allowed_chars, min_size=2,
max_size=2, unique=True)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using @composite here, we can simply return a strategy from our function:

Suggested change
return bytes(draw(hypothesis.strategies.lists(allowed_chars, min_size=2,
max_size=2, unique=True)))
return hypothesis.strategies.lists(allowed_chars, min_size=2, max_size=2, unique=True))).map(bytes)

In more complicated scenarios, this can be a decent performance improvement (by amortizing the construction and validation over multiple draws, instead of executing the body each time we draw a value) - particularly if we @functools.cache the function, or happen to draw from it multiple times (e.g. st.lists(altchars())). Mostly though it's just an idiom to remind readers that strategies are just python values, not magic.

@rhettinger
Copy link
Contributor

rhettinger commented May 22, 2024

I thought there had been a decision not to add hypothesis tests to any module not under active development. During development, hypothesis is good at teasing out special cases that need to be fixed and have an explicit targeted test. Once in production, hypothesis doesn't do much for us and leaves it unclear whether the search space for a property has actually been covered.

AFAICT this PR doesn't add any value, it just eats clock cycles and adds another tool that a person has to learn before they can read the tests. If our current tests are inadequate in any way, we should add specifically targeted tests to fill the gaps.

We should differentiate a goal we care about, "thoroughly understand and test our code" from the more dubious goal, "Let's use this tool every we can." ISTM that this PR only does the latter and doesn't help base64 in any way.

Also note, that we've already had a number of people do the exercise of trying hypothesis on many standard library modules. We were able to do that without actually checking in the tests into main distribution. FWIW, the main result of the exercise was identifying bugs in the tester's understanding of the module (i.e. an expectation that colorsys conversions would round-trip without understanding gamut limits and published specifications that did not round-trip).

@Zac-HD
Copy link
Contributor

Zac-HD commented May 22, 2024

I think these concerns are addressed in python/steering-council#65 and #107862; the stub implementation we added in #22863 has very low overhead.

We're starting with simpler cases rather than e.g. #83134, because it's easier to shake out config issues and get used to the workflow when we don't also need to work out how we'll handle e.g. the strategy to generate arbitrary Python source code.

@encukou
Copy link
Member Author

encukou commented May 22, 2024

I thought there had been a decision not to add hypothesis tests to any module not under active development.

I couldn't find that decision, do you have a link to it?
I'm happy to ask the SC for a decision if that's what we need.

I agree that the issues found by Hypothesis should be added to the tests that always run, though it might be enough to add them as @example.
Gaps in understanding can usually be used improve documentation -- see #119409 for two non-obvious things I found here.

Checking this into the repo means it stays in sync with stdlib.

@rhettinger
Copy link
Contributor

Focusing on this particular PR as an example of what is to come, is anyone claiming that it adds any value at all? Will users be any better-off? I don't think so.

AFAICT only the time and datetime module maintainer has said they wanted this (because it is part of their development process). No other module maintainer has asked for this. I expect that every such PR in the future will come from devs who have not previously shown any interest in the module being tested. Their goal isn't to improve the module. Their goal is just to use hypothesis. IMO that is a false goal. I don't think it should be done.

@encukou
Copy link
Member Author

encukou commented May 22, 2024

The goal is to use property-based testing in general -- i.e. using a tool to find edge cases that humans are bad at finding.

You're right that this is meant for tests in general: rather than base64 itself I'm interested in hypothesis/regrtest integration, the stubs (see the second commit here), and finding usability issues in running these tests. I'd love to get to the point where people who are actively developing a module have a bunch of examples and institutional knowledge ready to help them find new edge cases.

@rhettinger
Copy link
Contributor

FWIW, we already test properties (invariants) almost everywhere they are applicable (even in base64). You don't need hypothesis for that.

Hypothesis doesn't have a monopoly on properties or invariants (pretty much every existing assertion does that). The superpower of hypothesis is generating semi-random input patterns to find problems and then simplify to a minimal reproducer. That is helpful during development but not really something we want in the CI. Instead, a core-dev is expected to produce a comprehensive and deterministic sets of tests that reflect a full understanding of the code (both black-box and white-box).

@encukou
Copy link
Member Author

encukou commented May 23, 2024

I do not see how “not really something we want in the CI” follows from what was said before.

If this stuff is helpful during development, it should also be helpful to testing that the hypothesis/regrtest integration works as intended, and to have a bunch of worked examples around, so that getting it to run when you need it isn't an uphill battle.

Yes, a core dev is expected to produce good tests. Property-based testing is a tool to help with that. It also encodes the expected invariants for future generations, in a more useful format than only a set of examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants