Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

antonha
Copy link
Contributor

@antonha antonha commented Mar 13, 2024

This PR is the smallest I could make (except for number of LongNrq queries, could probably be fewer) to prove that the changes in apache/lucene/pull/13149 work.

I aimed at reproducing for wikimediumall. This needs to be run with optimize = True for indexing and commitPoint = 'single' for the competition - otherwise it is hard to see the performance difference. The reason for this is that the BkdTree IntsWriter otherwise chooses too good of a compression, since the number of documents is too low in each segment.

I'm not sure if this should be merged - the PR is mostly here for reference.

@@ -422,7 +422,7 @@ def __init__(self, cold=False,
# Pass fixed randomSeed so separate runs are comparable (pick the same tasks):
randomSeed=None,
benchSearch=True,
taskCountPerCat = 1,
taskCountPerCat = 20,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important, to trigger multiple implementations of the IntersectVisitor in PointRangeQueries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that this might make the benchmark slower, since more implementations of the IntersectVisitor might drag down performance due do virtual calls now being used.

In real Lucene applications these multiple implementations is probably the norm though, so that makes the benchmark better. apache/lucene#13149 should lower the performance decrease from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant