Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests to work with Z3 4.12.1 #3400

Merged
merged 110 commits into from
Feb 28, 2023

Conversation

atomb
Copy link
Member

@atomb atomb commented Jan 24, 2023

The updates the default version of Z3 to 4.12.1 and updates superficial aspects of the tests (error order, counterexample values, occasional command-line option tweaks) to work with it.

Those test changes have some minor caveats:

  • Some of the tests in DafnyTestGeneration.Test check very specific things about the solver results that I don't think are required of valid results. I've commented out what I think are the overly-restrictive checks, but they should really be replaced with similar but less restrictive ones. @Dargones, do you have any thoughts?
  • Some tests require specific settings of the Z3 parameter smt.qi.eager_threshold to succeed, but there isn't a single value that allows them all to succeed. There's even one test I had to split into two files to give two values of this parameter. Let's investigate whether the current value is a good one, in general. Tracked as Investigate best setting for Z3 parameter smt.qi.eager_threshold #3558.
  • Dafny is now able to prove an assertion in Fuel.dfy that it shouldn't be able to, according to comments in the file. I don't think this is technically either a soundness or incompleteness issue, but it's a strange behavior. Edit: it turns out that the latest nightly of Z3 goes back to being unable to prove that assertion.
  • One test (github-issues/github-issue-2174.dfy) requires a special setting of the arithmetic solver. This is something we should investigate further on other code bases. Issue Investigate Z3 arithmetic mode configuration #3501 tracks that investigation.
  • The BitvectorsMore test was missing errors since there were too many in one method. It had /errorLimit:10, but that wasn't even high enough. A newer solver version resulted in the order of errors changing, causing known errors to be hidden and previously-untracked errors to be revealed. A higher error limit solves this.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

atomb added 30 commits January 24, 2023 11:26
Better fixes are in order before merging this.
This should be updated to choose the option based on Z3 version.
The output still doesn’t match expected, though
This makes some predicates opaque and reveals them in relevant
locations. It makes Dafny able to prove more about the program than it
originally could. Run past @RustanLeino to ensure that this still
embodies the spirit of the original test.
One explicit instantiation of a postcondition did the trick.
It seems that 4.11.2 has a bug in model generation that causes one of
the language server tests to fail.
Newer Z3 versions don't seem to cause the problematic situation this
test was testing.
This test no longer tests what it used to when run with a recent version
of Z3 (e.g., 4.12.1), but it's not obvious how to write a new test that
would have triggered the previous crash. So this test isn't as useful as
it was in the past.
This test isn't actually testing what Dafny can prove, so changes in
what it succeeds in proving aren't problematic.
Counterexamples that differ from the ones previously expected by these
tests are still reasonable, and now accepted.
With the new automated conversions between int/nat and small bit
vectors in shift expressions, more errors are possible, and this
test previously didn’t have a high enough limit to catch them all.
It still passed only because the error ordering hadn’t changed, so
the set under the limit was the same.
This should allow Z3 binaries from GitHub to work.

May not be a good situation long-term. We may want to figure out how to
get binaries compiled to work on Ubuntu 20.04.
@atomb atomb added the run-deep-tests Tells CI to run all tests label Feb 27, 2023
@atomb atomb changed the base branch from master to main-4.0 February 28, 2023 02:41
BitvectorsMore.dfy(168,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0
BitvectorsMore.dfy(169,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0
BitvectorsMore.dfy(170,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0
BitvectorsMore.dfy(171,33): Error: when converting shift amount to a bit vector, the value to be converted might not fit in bv0
BitvectorsMore.dfy(193,26): Error: shift amount must not exceed the width of the result (5)
BitvectorsMore.dfy(194,26): Error: shift amount must not exceed the width of the result (5)

Dafny program verifier finished with 9 verified, 37 errors
Dafny program verifier finished with 9 verified, 41 errors
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The larger number of errors here is due to /errorLimit:0.

@atomb atomb enabled auto-merge (squash) February 28, 2023 03:34
The %run, %translate, and %verify macros don’t quite work yet.
@@ -36,7 +36,7 @@ jobs:
run: |
sudo apt-get install -qq libarchive-tools
mkdir -p dafny/Binaries/z3/bin
wget -qO- https://github.com/dafny-lang/solver-builds/releases/download/snapshot-2023-02-17/z3-4.8.5-ubuntu-20.04-bin.zip | bsdtar -xf -
wget -qO- https://github.com/dafny-lang/solver-builds/releases/download/snapshot-2023-02-17/z3-4.12.1-ubuntu-20.04-bin.zip | bsdtar -xf -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI (to fix before the next time we change the version number :), it's not too hard to extract out the value from source code to avoid duplicating it all over the place in these files. For example: https://github.com/dafny-lang/dafny/blob/master/.github/workflows/msbuild.yml#L35

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be a nice improvement.

This is assertion #2 of 4 in method Abs
Resource usage: 9K RU"
This is assertion #1 of 4 in method Abs
Resource usage: ??? RU"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we lost this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ??? matches anything, so that change just makes the test less fragile as the solver evolves.

@@ -1,3 +1,3 @@
// RUN: %dafny /verifyAllModules /allocated:1 /compile:0 /print:"%t.print" /dprint:"%t.dprint" "%s" > "%t"
// RUN: %dafny /verifyAllModules /allocated:1 /proverOpt:O:smt.qi.eager_threshold=25 /compile:0 /print:"%t.print" /dprint:"%t.dprint" "%s" > "%t"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an option we should (right now) suggest as a verification technique in the manual and (slightly later) add a Dafny-level option for of some kind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered that. I'd like to experiment with it a bit more on production code and see if it makes any difference there.

AutoContracts.dfy(79,21): Error: A postcondition might not hold on this return path.
AutoContracts.dfy(60,16): Related location: This is the postcondition that might not hold.
AutoContracts.dfy[N1](65,20): Related location
AutoContracts.dfy(79,21): Error: A postcondition might not hold on this return path.
AutoContracts.dfy(60,16): Related location: This is the postcondition that might not hold.
AutoContracts.dfy[N1](65,20): Related location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but why do we have "N1" as a line number?

Copy link
Member Author

@atomb atomb Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's to indicate that it refers the version of the Valid predicate, declared on line 65, as it exists in the refining module N1, which appears later in the source. Since N1 doesn't say anything about that particular member, there's no line number within the text of N1 that the message could refer to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I actually missed that it was in addition to the actual source location. It's the first time I've encountered that.

@@ -1,4 +1,4 @@
// RUN: %exits-with 4 %dafny /compile:4 /dprint:"%t.dprint" "%s" > "%t"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup :)

@@ -48,7 +48,7 @@ class Twostate {
}

twostate predicate GoodVariations(c: Twostate, d: Twostate, e: Twostate, f: Twostate)
reads [this], c, {d, e}, multiset{f}
reads this, c, {d, e}, multiset{f}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this have to change?

Test/vstte2012/BreadthFirstSearch.dfy Show resolved Hide resolved
Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHEW. I do want to discuss my questions but merging this to get a RC for Dafny 4.0 earlier. I cut #3651 to make sure we come back to them before releasing.

@robin-aws robin-aws dismissed davidcok’s stale review February 28, 2023 15:33

Looks to be addressed, but see #3651 if not

@atomb atomb merged commit dbba173 into dafny-lang:main-4.0 Feb 28, 2023
@atomb atomb deleted the update-tests-z3-4.11.2 branch January 4, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-deep-tests Tells CI to run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants