-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tests to work with Z3 4.12.1 #3400
Conversation
Better fixes are in order before merging this.
This should be updated to choose the option based on Z3 version.
The output still doesn’t match expected, though
This makes some predicates opaque and reveals them in relevant locations. It makes Dafny able to prove more about the program than it originally could. Run past @RustanLeino to ensure that this still embodies the spirit of the original test.
One explicit instantiation of a postcondition did the trick.
It seems that 4.11.2 has a bug in model generation that causes one of the language server tests to fail.
Newer Z3 versions don't seem to cause the problematic situation this test was testing.
This test no longer tests what it used to when run with a recent version of Z3 (e.g., 4.12.1), but it's not obvious how to write a new test that would have triggered the previous crash. So this test isn't as useful as it was in the past.
This test isn't actually testing what Dafny can prove, so changes in what it succeeds in proving aren't problematic.
Counterexamples that differ from the ones previously expected by these tests are still reasonable, and now accepted.
With the new automated conversions between int/nat and small bit vectors in shift expressions, more errors are possible, and this test previously didn’t have a high enough limit to catch them all. It still passed only because the error ordering hadn’t changed, so the set under the limit was the same.
This should allow Z3 binaries from GitHub to work. May not be a good situation long-term. We may want to figure out how to get binaries compiled to work on Ubuntu 20.04.
BitvectorsMore.dfy(168,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0 | ||
BitvectorsMore.dfy(169,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0 | ||
BitvectorsMore.dfy(170,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0 | ||
BitvectorsMore.dfy(171,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0 | ||
BitvectorsMore.dfy(193,26): Error: shift amount must not exceed the width of the result (5) | ||
BitvectorsMore.dfy(194,26): Error: shift amount must not exceed the width of the result (5) | ||
|
||
Dafny program verifier finished with 9 verified, 37 errors | ||
Dafny program verifier finished with 9 verified, 41 errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The larger number of errors here is due to /errorLimit:0
.
@@ -36,7 +36,7 @@ jobs: | |||
run: | | |||
sudo apt-get install -qq libarchive-tools | |||
mkdir -p dafny/Binaries/z3/bin | |||
wget -qO- https://github.com/dafny-lang/solver-builds/releases/download/snapshot-2023-02-17/z3-4.8.5-ubuntu-20.04-bin.zip | bsdtar -xf - | |||
wget -qO- https://github.com/dafny-lang/solver-builds/releases/download/snapshot-2023-02-17/z3-4.12.1-ubuntu-20.04-bin.zip | bsdtar -xf - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI (to fix before the next time we change the version number :), it's not too hard to extract out the value from source code to avoid duplicating it all over the place in these files. For example: https://github.com/dafny-lang/dafny/blob/master/.github/workflows/msbuild.yml#L35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be a nice improvement.
This is assertion #2 of 4 in method Abs | ||
Resource usage: 9K RU" | ||
This is assertion #1 of 4 in method Abs | ||
Resource usage: ??? RU" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we lost this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ???
matches anything, so that change just makes the test less fragile as the solver evolves.
@@ -1,3 +1,3 @@ | |||
// RUN: %dafny /verifyAllModules /allocated:1 /compile:0 /print:"%t.print" /dprint:"%t.dprint" "%s" > "%t" | |||
// RUN: %dafny /verifyAllModules /allocated:1 /proverOpt:O:smt.qi.eager_threshold=25 /compile:0 /print:"%t.print" /dprint:"%t.dprint" "%s" > "%t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an option we should (right now) suggest as a verification technique in the manual and (slightly later) add a Dafny-level option for of some kind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered that. I'd like to experiment with it a bit more on production code and see if it makes any difference there.
AutoContracts.dfy(79,21): Error: A postcondition might not hold on this return path. | ||
AutoContracts.dfy(60,16): Related location: This is the postcondition that might not hold. | ||
AutoContracts.dfy[N1](65,20): Related location | ||
AutoContracts.dfy(79,21): Error: A postcondition might not hold on this return path. | ||
AutoContracts.dfy(60,16): Related location: This is the postcondition that might not hold. | ||
AutoContracts.dfy[N1](65,20): Related location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but why do we have "N1" as a line number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's to indicate that it refers the version of the Valid
predicate, declared on line 65, as it exists in the refining module N1
, which appears later in the source. Since N1
doesn't say anything about that particular member, there's no line number within the text of N1
that the message could refer to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I actually missed that it was in addition to the actual source location. It's the first time I've encountered that.
@@ -1,4 +1,4 @@ | |||
// RUN: %exits-with 4 %dafny /compile:4 /dprint:"%t.dprint" "%s" > "%t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup :)
@@ -48,7 +48,7 @@ class Twostate { | |||
} | |||
|
|||
twostate predicate GoodVariations(c: Twostate, d: Twostate, e: Twostate, f: Twostate) | |||
reads [this], c, {d, e}, multiset{f} | |||
reads this, c, {d, e}, multiset{f} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this have to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHEW. I do want to discuss my questions but merging this to get a RC for Dafny 4.0 earlier. I cut #3651 to make sure we come back to them before releasing.
Looks to be addressed, but see #3651 if not
The updates the default version of Z3 to 4.12.1 and updates superficial aspects of the tests (error order, counterexample values, occasional command-line option tweaks) to work with it.
Those test changes have some minor caveats:
DafnyTestGeneration.Test
check very specific things about the solver results that I don't think are required of valid results. I've commented out what I think are the overly-restrictive checks, but they should really be replaced with similar but less restrictive ones. @Dargones, do you have any thoughts?smt.qi.eager_threshold
to succeed, but there isn't a single value that allows them all to succeed. There's even one test I had to split into two files to give two values of this parameter. Let's investigate whether the current value is a good one, in general. Tracked as Investigate best setting for Z3 parametersmt.qi.eager_threshold
#3558.Fuel.dfy
that it shouldn't be able to, according to comments in the file. I don't think this is technically either a soundness or incompleteness issue, but it's a strange behavior. Edit: it turns out that the latest nightly of Z3 goes back to being unable to prove that assertion.github-issues/github-issue-2174.dfy
) requires a special setting of the arithmetic solver. This is something we should investigate further on other code bases. Issue Investigate Z3 arithmetic mode configuration #3501 tracks that investigation.BitvectorsMore
test was missing errors since there were too many in one method. It had/errorLimit:10
, but that wasn't even high enough. A newer solver version resulted in the order of errors changing, causing known errors to be hidden and previously-untracked errors to be revealed. A higher error limit solves this.By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.