-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ContainerGenerator Performance Improvement #227
Conversation
Very interesting optimisation. And so obvious, once you think of it. :-/ |
I believe there is one more change I should make: Instead of: Collection<T> existingValues = new LinkedHashSet<>(); It should be: Collection<T> existingValues = noDuplicates ? new LinkedHashSet<>() : new ArrayList<>(); Right? |
Since a hash collission does not exclude a value from being added, I think everything should work alright since uniquenessExtractor works on values that are not equal (using equals()). I'm wondering, though, why I haven't used a Set for collecting existing values in the first place. Maybe it was just an oversight. Have you tried using a plain HashSet? It should also work and maybe even have better performance. I think I'll do a bit more "due diligence" before merging this PR just to make sure. |
My understanding of the jqwik source code is very limited, so I was trying to break as little as possible. With
Could uniquenessExtractors that use a class CaseInsensitive
{
public final String string;
public CaseInsensitive( String _string ) { string =_string; }
@Override public int hashCode() { return string.toLowerCase().hashCode(); }
@Override public boolean equals( Object s ) {
return s instanceof CaseInsensitive && ((CaseInsensitive) s).string.equalsIgnoreCase(string);
}
}
Arbitraries.strings().map(CaseInsensitive::new).array(CaseInsensitive[].class).uniqueElements( ci -> ci.string ); The two values Maybe I'm just overthinking this 😟 |
You are not. My imagination wasn't big enough to come up with an evil example like yours. So maybe:
would suffice? |
That looks like the safest solution. I've also added two tests including the evil case. Feel free to make any change or change suggestions. |
Details
Generating 1000 tries of
Arbitraries.integers().array(int[].class).ofMaxSize(1_000_000);
takes over 7 hours on a fairly beefy machine. My profiler tells me that jqwik is spending large amounts of time inContainerGenerator.next
which I believe to requireO(n²)
time. This proposed solution should only requireO(n)
using aLinkedHashSet
instead ofArrayList
.This may be a breaking change in the sense that the
LinkedHashSet
relies onhashCode
which was not used before. But according to Java conventionshashCode
andequals
have to be consistent with one another anyways. So this should not be an issue.I hereby agree to the terms of the jqwik Contributor Agreement.