-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove sub-ir examples #4007
Remove sub-ir examples #4007
Conversation
def start_example(self, i: int, label_index: int) -> None: | ||
depth = len(self.example_stack) | ||
self.groups[label_index, depth].append(i) | ||
key = (self.examples[i].ir_start, self.examples[i].ir_end) | ||
self.groups[label_index].add(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LazyStrategy
causes duplicate example boundaries, since best I can tell we wrap all of our core strategies in a lazy strategy. I've worked around that here by making groups unique up to (ir_start, ir_end)
. The ideal is "don't create examples for LazyStrategy", but a naive fix there breaks data.draw(label=...)
overrides, since they don't get passed to the underlying strategy call. May be doable with more plumbing.
e158a66
to
de854e2
Compare
OK, sorry, I shouldn't have tried to push through both sub-ir removal and generate_mutations from in the same pull. I'll split the latter to a dependent pull. |
de854e2
to
774b366
Compare
generate_mutations_from
to the IR
...actually, removing just sub-ir examples is flaky, but isn't flaky when we also migrate mutations from. I'm not even going to look into why. Let's bundle both changes together. Sorry for all the back and forth lol. |
|
similarly I don't see how test_observability is flaky with a seed 🤔 time for some ci-driven development... |
|
Here's the hypothesis/hypothesis-python/tests/common/setup.py Lines 52 to 58 in 395649a
(I don't want to talk about how long I spent tracking this down.) Arguably a determinism footgun, but also clearly good for performance. Possibly we should use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the performance hit might not be that big since we started using xdist under coverage - we could try it and see?
seeing about a 50% performance hit locally for I've worked around this locally by explicitly setting max_examples, btw, so we can revisit this later if we want. |
@jobh I'm going to toss this |
That will not help unfortunately, as there is a crash (segfault?) which bypasses all of that. When I looked into these crashes previously, I found them to be unreproducible locally; but managed to reproduce on CI/master at the time so I'm not too worried about a regression. [edit: Unreproducible may be too strong claim. I've never gotten After seeing comments in pytest-dev/pytest#3216 I think it would be close to impossible to track down the root cause here. @Zac-HD, is there a place to move the test to where it would run without |
We have some tests that invoke a new pytest process to run entirely independently, and could do that for this one too. If that doesn't work either for some reason, skipping sounds good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, let's merge as-is and fix the flake in a separate PR 🚀
Part of #3921.
For instance, we now deduplicate the following perfectly:
st.integers(n, m)
withm - n <= 127
(due to integer weights drawing another int...)st.lists(st.integers())
In fact, our deduplication tracking may now be so good that we need to artificially duplicate inputs in order to coax out flaky errors!
Accordingly, this fixes our duplication of
timezone_keys
in #3932.