-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable hashes have too many collisions #1594
Comments
Nice catch! @sim642: I think for For the hashing in |
That seems possible. All uses of
This is wrong because some of the int domains (e.g. def_exc, enums) contain OCaml
I just tried (e.g. in base's let i1 = ID.top_of IInt in
let i2 = ID.of_excl_list IInt [Z.zero] in
Logs.error "%d %d" (ID.hash i1) (ID.hash i2) and this prints out different hash values |
Maybe I also misremembered some of the details. @arkocal do you still have the example at hand? |
I am not sure if the suggested hashes above are correct or performant. They are not meant to be solution suggestions, just illustrative examples for the source of collisions, sorry for the confusion. Testing with
Coming from: let interesting = 4529258937874703681 in
let colliding = HM.keys rho |> List.of_enum |> List.filter (fun k -> S.Var.hash k = interesting) in
match colliding with
| a::b::[] ->
if tracing then trace "coll" "Colliding variables:\n %a \nand\n %a" S.Var.pretty_trace a S.Var.pretty_trace b;
if tracing then trace "coll" "Hash of a: %d,\n Hash of b: %d" (S.Var.hash a) (S.Var.hash b);
if tracing then trace "coll" "Values are equal: %b" (S.Var.equal a b);
| _ -> ();
; Since the string representations are the same, this might also be an issue with |
Enabling |
This is very strange, trying: let i1 = ID.top_of IInt in
let i2 = ID.of_excl_list IInt [Z.zero] in
Logs.error "%d %d" (ID.hash i1) (ID.hash i2) Outputs
on my machine on 0deedb9 |
With For completeness:
|
Do you maybe have different versions of |
This would explain it. I am getting the same values from a docker build as well. Looking at the Dockerfile it looks like my local state should not somehow creep into the container. |
ppx_deriving_hash is not at fault because Goblint has a lower bound on the latest version. You couldn't even compile with a different version. The problem is our fork of Zarith where we have version 1.12 with custom patches: https://github.com/goblint/Zarith/tree/goblint-1.12. That patch probably needs to be adapted if we want to benefit from Zarith improvements and keep Gobview. |
With Zarith 1.14 I get the same values, and the few remaining conflicts have no obvious similarities. |
Yeah, potentially GobView can be retired at some point --- currently no one seems interested in it (correct me if I'm wrong). Though it feels like a waste to have invested so much effort to then not get a paper out of it. |
It's not necessarily a blocker for now: we could use Zarith 1.14 by default, but use our Zarith 1.12 fork with Gobview. One just needs to be careful when benchmarking to not use Gobview opam pins. There might be other places where we should also use |
This is more robust against changes to the possible choices of domain. It would have avoided issue #1594.
@arkocal feel free to reopen this issue if you encounter any more instances of this. |
Thanks, I think this should be enough for all practical purposes for now. And of course, will do :) |
CHANGES: Functionally equivalent to Goblint in SV-COMP 2025. * Add 32bit vs 64bit architecture support (goblint/analyzer#54, goblint/analyzer#1574). * Add per-function context gas analysis (goblint/analyzer#1569, goblint/analyzer#1570, goblint/analyzer#1598). * Adapt automatic static loop unrolling (goblint/analyzer#1516, goblint/analyzer#1582, goblint/analyzer#1583, goblint/analyzer#1584, goblint/analyzer#1590, goblint/analyzer#1595, goblint/analyzer#1599). * Adapt automatic configuration tuning (goblint/analyzer#1450, goblint/analyzer#1612, goblint/analyzer#1181, goblint/analyzer#1604). * Simplify non-relational integer invariants in witnesses (goblint/analyzer#1517). * Fix excessive hash collisions (goblint/analyzer#1594, goblint/analyzer#1602). * Clean up various code (goblint/analyzer#1095, goblint/analyzer#1523, goblint/analyzer#1554, goblint/analyzer#1575, goblint/analyzer#1588, goblint/analyzer#1597, goblint/analyzer#1614).
Summary: With about 60k variables encountered, we have about 1.04 entries per hash. With an ideal hash function the chances for even a single collision would have been very low.
The experiment can be repeated using the
topdown
solver onarkocal/hash_collision_demo
branch, which includes a minimal module for counting how often each hash value appears.In particular, variables that differ by a single string are mapped to the same hash.
The following changes already significantly reduce collisions (1.0008 keys per hash, no obvious similarities between colliding keys):
In
cdomains/stringDomain.ml
(https://github.com/goblint/analyzer/blob/master/src/cdomain/value/cdomains/stringDomain.ml) :In
cdomains/intDomain.ml
, add to line 3680, afterlet to_yojson ...
The text was updated successfully, but these errors were encountered: