Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sub-ir examples #4007

Merged
merged 18 commits into from
Jun 14, 2024
Merged

Conversation

tybug
Copy link
Member

@tybug tybug commented May 29, 2024

Part of #3921.

For instance, we now deduplicate the following perfectly:

  • st.integers(n, m) with m - n <= 127 (due to integer weights drawing another int...)
  • st.lists(st.integers())
  • probably many more

In fact, our deduplication tracking may now be so good that we need to artificially duplicate inputs in order to coax out flaky errors!

Accordingly, this fixes our duplication of timezone_keys in #3932.

@given(st.integers(0, 99), st.integers(0, 99))
@settings(database=None, max_examples=100000)
def f(n1, n2):
    seen.append((n1, n2))
f()
# 10000, 10000
print(len(set(seen)), len(seen)) 

@tybug tybug requested a review from Zac-HD as a code owner May 29, 2024 19:12
Comment on lines 645 to 647
def start_example(self, i: int, label_index: int) -> None:
depth = len(self.example_stack)
self.groups[label_index, depth].append(i)
key = (self.examples[i].ir_start, self.examples[i].ir_end)
self.groups[label_index].add(key)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyStrategy causes duplicate example boundaries, since best I can tell we wrap all of our core strategies in a lazy strategy. I've worked around that here by making groups unique up to (ir_start, ir_end). The ideal is "don't create examples for LazyStrategy", but a naive fix there breaks data.draw(label=...) overrides, since they don't get passed to the underlying strategy call. May be doable with more plumbing.

@tybug tybug force-pushed the remove-sub-ir-examples branch from e158a66 to de854e2 Compare May 29, 2024 19:45
@tybug
Copy link
Member Author

tybug commented May 29, 2024

OK, sorry, I shouldn't have tried to push through both sub-ir removal and generate_mutations from in the same pull. I'll split the latter to a dependent pull.

@tybug tybug force-pushed the remove-sub-ir-examples branch from de854e2 to 774b366 Compare May 29, 2024 19:55
@tybug tybug changed the title Remove sub-ir examples and migrate generate_mutations_from to the IR Remove sub-ir examples May 29, 2024
@tybug
Copy link
Member Author

tybug commented May 29, 2024

...actually, removing just sub-ir examples is flaky, but isn't flaky when we also migrate mutations from. I'm not even going to look into why. Let's bundle both changes together. Sorry for all the back and forth lol.

@tybug
Copy link
Member Author

tybug commented May 31, 2024

test_inquisitor_comments_basic_fail_if_either flaked, though I don't know how given that we set derandomize=True.

@tybug
Copy link
Member Author

tybug commented Jun 1, 2024

similarly I don't see how test_observability is flaky with a seed 🤔

time for some ci-driven development...

@tybug
Copy link
Member Author

tybug commented Jun 2, 2024

test_observability failure reproduces locally with ./build.sh check-coverage(and not when running tests normally), just like CI. I'm suspecting Interactions with sys.settrace, since running with coverage sets a tracer.

@tybug
Copy link
Member Author

tybug commented Jun 13, 2024

Here's the test_observability culprit. The failing example was found somewhere in 20 < n <= 100 calls and so only reproduced when not running under coverage:

settings.register_profile(
"default",
settings(
max_examples=20 if IN_COVERAGE_TESTS else not_set,
phases=list(Phase), # Dogfooding the explain phase
),
)

(I don't want to talk about how long I spent tracking this down.) Arguably a determinism footgun, but also clearly good for performance. Possibly we should use max_examples=not_set when @seed is set on tests? But I don't think profiles can express that.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the performance hit might not be that big since we started using xdist under coverage - we could try it and see?

@tybug
Copy link
Member Author

tybug commented Jun 14, 2024

seeing about a 50% performance hit locally for ./build.sh check-coverage when removing the max_examples override (112s -> 163s). Probably not worth it since check-coverage is already our longest CI job? 3.12 coverage support can't come fast enough 🙂

I've worked around this locally by explicitly setting max_examples, btw, so we can revisit this later if we want.

@tybug
Copy link
Member Author

tybug commented Jun 14, 2024

@jobh I'm going to toss this test_gc_hooks_do_not_cause_unraisable_recursionerror flake over to you, I think (should we just increase max_runs?): https://github.com/HypothesisWorks/hypothesis/actions/runs/9509679886/job/26212997475?pr=4007

@jobh
Copy link
Contributor

jobh commented Jun 14, 2024

should we just increase max_runs?

That will not help unfortunately, as there is a crash (segfault?) which bypasses all of that.

When I looked into these crashes previously, I found them to be unreproducible locally; but managed to reproduce on CI/master at the time so I'm not too worried about a regression.

[edit: Unreproducible may be too strong claim. I've never gotten build.sh working locally, so "unreproducible locally using plain pytest" would be correct.]

After seeing comments in pytest-dev/pytest#3216 I think it would be close to impossible to track down the root cause here. @Zac-HD, is there a place to move the test to where it would run without xdist? Otherwise I think we should just skip the test unconditionally.

@Zac-HD
Copy link
Member

Zac-HD commented Jun 14, 2024

After seeing comments in pytest-dev/pytest#3216 I think it would be close to impossible to track down the root cause here. @Zac-HD, is there a place to move the test to where it would run without xdist? Otherwise I think we should just skip the test unconditionally.

We have some tests that invoke a new pytest process to run entirely independently, and could do that for this one too. If that doesn't work either for some reason, skipping sounds good to me.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, let's merge as-is and fix the flake in a separate PR 🚀

@Zac-HD Zac-HD merged commit 6c155a4 into HypothesisWorks:master Jun 14, 2024
54 checks passed
@tybug tybug deleted the remove-sub-ir-examples branch June 14, 2024 21:06
@jobh jobh mentioned this pull request Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants