Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContainerGenerator Performance Improvement #227

Merged
merged 11 commits into from
Sep 15, 2021
Merged

Conversation

DirkToewe
Copy link
Contributor

Details

Generating 1000 tries of Arbitraries.integers().array(int[].class).ofMaxSize(1_000_000); takes over 7 hours on a fairly beefy machine. My profiler tells me that jqwik is spending large amounts of time in ContainerGenerator.next which I believe to require O(n²) time. This proposed solution should only require O(n) using a LinkedHashSet instead of ArrayList.

This may be a breaking change in the sense that the LinkedHashSet relies on hashCode which was not used before. But according to Java conventions hashCode and equals have to be consistent with one another anyways. So this should not be an issue.

I hereby agree to the terms of the jqwik Contributor Agreement.

@jlink
Copy link
Collaborator

jlink commented Sep 13, 2021

Very interesting optimisation. And so obvious, once you think of it. :-/

@DirkToewe
Copy link
Contributor Author

DirkToewe commented Sep 13, 2021

I believe there is one more change I should make: Instead of:

Collection<T> existingValues = new LinkedHashSet<>();

It should be:

Collection<T> existingValues = noDuplicates ? new LinkedHashSet<>() : new ArrayList<>();

Right?
Otherwise the uniquenessExtractors might not see all the generated values.
Maybe even better would be something like a LinkedHashMultiset but that would require an external library like Guava...

@jlink
Copy link
Collaborator

jlink commented Sep 13, 2021

I believe there is one more change I should make: Instead of:

Collection<T> existingValues = new LinkedHashSet<>();

It should be:

Collection<T> existingValues = noDuplicates ? new LinkedHashSet<>() : new ArrayList<>();

Right?
Otherwise the uniquenessExtractors might not see all the generated values.
Maybe even better would be something like a LinkedHashMultiset but that would require an external library like Guava...

Since a hash collission does not exclude a value from being added, I think everything should work alright since uniquenessExtractor works on values that are not equal (using equals()).

I'm wondering, though, why I haven't used a Set for collecting existing values in the first place. Maybe it was just an oversight. Have you tried using a plain HashSet? It should also work and maybe even have better performance.

I think I'll do a bit more "due diligence" before merging this PR just to make sure.

@DirkToewe
Copy link
Contributor Author

DirkToewe commented Sep 13, 2021

My understanding of the jqwik source code is very limited, so I was trying to break as little as possible. With LinkedHashSet the order of elements is maintained. If the order is not relevenat for the uniquenessExtractors, I can change it.

uniquenessExtractor works on values that are not equal (using equals())

Could uniquenessExtractors that use a by function cause any difficulties? Something like:

class CaseInsensitive
{
  public final String string;
  public CaseInsensitive( String _string ) { string =_string; }
  @Override public int hashCode() { return string.toLowerCase().hashCode(); }
  @Override public boolean equals( Object s ) {
    return s instanceof CaseInsensitive && ((CaseInsensitive) s).string.equalsIgnoreCase(string);
  }
}

Arbitraries.strings().map(CaseInsensitive::new).array(CaseInsensitive[].class).uniqueElements( ci -> ci.string );

The two values a = new CaseInsensitive("JQWIK") and b = new CaseInsensitive("jqwik") are equal according to a.equals(b) but not according to .uniqueElements( ci -> ci.string ).

Maybe I'm just overthinking this 😟

@jlink
Copy link
Collaborator

jlink commented Sep 14, 2021

Maybe I'm just overthinking this 😟

You are not. My imagination wasn't big enough to come up with an evil example like yours.

So maybe:

Collection<T> existingValues = uniquenessExtractors.isEmpty() ? new HashSet<>() : new ArrayList<>();

would suffice?

@DirkToewe
Copy link
Contributor Author

Collection<T> existingValues = uniquenessExtractors.isEmpty() ? new HashSet<>() : new ArrayList<>();

That looks like the safest solution. I've also added two tests including the evil case. Feel free to make any change or change suggestions.

@jlink jlink merged commit da4a521 into jqwik-team:main Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants