Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Multiple playback harnesses when multiple crashes exist in a single harness. #2496

Merged

Conversation

YoshikiTakashima
Copy link
Contributor

Description of changes:

Allow multiple harnesses to be generated.

Resolved issues:

Resolves #2461

Call-outs:

Note: We should also update the warning, since concrete playback now supports different property type.

Not sure what is requested here.

Testing:

  • How is this change tested? tests/ui/concrete-playback/single-harness-multi-trace

  • Is this a refactor change? No

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@YoshikiTakashima YoshikiTakashima requested a review from a team as a code owner May 31, 2023 19:36
@YoshikiTakashima YoshikiTakashima changed the title Generate Multiple harnesses when multiple crashes exist in a single harness. Generate Multiple playback harnesses when multiple crashes exist in a single harness. May 31, 2023
Copy link
Contributor

@jaisnan jaisnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions regarding the change -

  1. Do all of the tests have unique identifiers as suffixes? I'm assuming yes, but can you make that more clear in the regression tests that you've added?
  2. Would it make sense to hide this behavior behind a flag?

I'm asking because the vscode extension runs the playback command and in cases where there's more than one unit test being generated, I'm wondering how the behavior might change and if there's something we need to change within the extension as well, to accommodate these cases.

@YoshikiTakashima
Copy link
Contributor Author

@jaisnan

  1. I think they are unique, but I don't think they are deterministic. See tests/ui/concrete-playback/mult-harnesses/expected because those are also unique, but the concrete playback expected is also identical.
  2. I don't immediately have an answer to this one. I would prefer to show all of them, but I understand it may not be ideal if 9000 test cases come out and hammer the customer dev laptop without warning.

@YoshikiTakashima
Copy link
Contributor Author

I'll mark this as draft since it is not immidiately clear what ought to be done. Thus fixing the test fails at this point does not make sense.

@YoshikiTakashima YoshikiTakashima marked this pull request as draft May 31, 2023 20:36
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I have 2 requests:

  1. Can you please add a test for inplace where Kani will add more than one test for more than one harness at the same time? I'm a bit worried about updating the file multiple times in a row since we use original line numbers to calculate the test position.
  2. The current implementation is generating a modified version of the file N times. Can we add all the tests in one pass instead?

kani-driver/src/concrete_playback/test_generator.rs Outdated Show resolved Hide resolved
kani-driver/src/concrete_playback/test_generator.rs Outdated Show resolved Hide resolved
@YoshikiTakashima YoshikiTakashima force-pushed the yoshi-2461-multi-test branch from fb41650 to 0cae000 Compare May 31, 2023 21:58
@YoshikiTakashima YoshikiTakashima marked this pull request as ready for review June 2, 2023 01:36
@YoshikiTakashima
Copy link
Contributor Author

@celinval I think I resolved your comments with the exception of the inplace testing, which I think is matter of placing the right files in the right directories. I will flip the PR over to "ready for review"

@jaisnan
Copy link
Contributor

jaisnan commented Jun 2, 2023

I'm a bit worried about a potential proof having a lot of crashes as @YoshikiTakashima has mentioned, and the user clicking the concrete playback button and discovering that they have a lot of test cases pasted into their source. It might be annoying to deal with. Does it make sense to be worried about this corner case? Do we have any plans to deal with that when that happens?

@YoshikiTakashima
Copy link
Contributor Author

@jaisnan I think it's plausible if the harness is large. One possibility is that we inject first N harnesses where N is a user-set parameter in the extension. Not sure if there is a good way to prioritize them.

As for dealing with it when it happens, I think most IDEs have a history beyond last save, so ctrl-z should still work. Not too familiar except for Emacs and VS Code though.

@celinval
Copy link
Contributor

celinval commented Jun 5, 2023

@jaisnan I think it's plausible if the harness is large. One possibility is that we inject first N harnesses where N is a user-set parameter in the extension. Not sure if there is a good way to prioritize them.

As for dealing with it when it happens, I think most IDEs have a history beyond last save, so ctrl-z should still work. Not too familiar except for Emacs and VS Code though.

We could add a configurable cap, and even set its default to 1 to keep the same behavior as today.

Regarding priorities, we could have different modes, like user assertions, cover statements and UB checks (the last doesn't work yet with concrete playback). I was wondering if by default we should only generate tests for assertion failures.

Another possibility is to allow users to specify which property they want to generate the testcase for. We don't have any stable way of doing so though.

@adpaco-aws, any thoughts?

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks @YoshikiTakashima

@celinval celinval dismissed their stale review June 5, 2023 20:23

blocking changes have been addressed

@adpaco-aws
Copy link
Contributor

We could add a configurable cap, and even set its default to 1 to keep the same behavior as today.

This is a good starting point: we can remove the technical limitation first and decide on what properties to target later.

Regarding priorities, we could have different modes, like user assertions, cover statements and UB checks (the last doesn't work yet with concrete playback). I was wondering if by default we should only generate tests for assertion failures.

Still, tests generated for cover statement could be useful for proof debugging. I don't think test cases for UB checks would be that useful though.

@celinval
Copy link
Contributor

celinval commented Jun 6, 2023

We could add a configurable cap, and even set its default to 1 to keep the same behavior as today.

This is a good starting point: we can remove the technical limitation first and decide on what properties to target later.

Regarding priorities, we could have different modes, like user assertions, cover statements and UB checks (the last doesn't work yet with concrete playback). I was wondering if by default we should only generate tests for assertion failures.

Still, tests generated for cover statement could be useful for proof debugging. I don't think test cases for UB checks would be that useful though.

I think UB tests would be useful. For example, memory checks could be debugged using valgrind, and other UB checks could be debugged using MIRI.

@adpaco-aws
Copy link
Contributor

I think UB tests would be useful. For example, memory checks could be debugged using valgrind, and other UB checks could be debugged using MIRI.

Right, I hadn't thought of using them with other tools.

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. Thanks!

kani-driver/src/concrete_playback/test_generator.rs Outdated Show resolved Hide resolved
kani-driver/src/concrete_playback/test_generator.rs Outdated Show resolved Hide resolved
kani-driver/src/concrete_playback/test_generator.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just please prune the info. Thanks!

@YoshikiTakashima YoshikiTakashima enabled auto-merge (squash) June 8, 2023 00:43
@YoshikiTakashima YoshikiTakashima merged commit 95e7161 into model-checking:main Jun 8, 2023
@YoshikiTakashima YoshikiTakashima deleted the yoshi-2461-multi-test branch June 8, 2023 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concrete playback should be able to generate multiple tests at once
4 participants