Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogMergePolicy set_max_merge_size not being respected #1035

Closed
andrebsguedes opened this issue May 10, 2021 · 3 comments
Closed

LogMergePolicy set_max_merge_size not being respected #1035

andrebsguedes opened this issue May 10, 2021 · 3 comments
Assignees

Comments

@andrebsguedes
Copy link

Describe the bug

  • What did you do?
    Configured the LogMergePolicy with max merge size and inserted 38M documents with a single final commit in the end.
let writer = index
    .writer(1024 * 1024 * 1024)
    .expect("failed to create writer");

let mut policy = tantivy::merge_policy::LogMergePolicy::default();
policy.set_max_merge_size(3_000_000);

writer.set_merge_policy(Box::new(policy));
  • What happened?

Multiple 1M document segments merged into 9M document segments and two 9M document segments merged into a ~ 18M document segment

  • What was expected?

The expected behavior was that no segment larger than 3M documents would be merged. As an aside, the desired behavior for my use case was to limit the maximum size of a segment to 3M documents in order to have enough segments to optimize parallelism.

Which version of tantivy are you using?
If "master", ideally give the specific sha1 revision.

Version 13.2

To Reproduce

If your bug is deterministic, can you give a minimal reproducing code?
Some bugs are not deterministic. Can you describe with precision in which context it happened?
If this is possible, can you share your code?

The time required to reindex all documents is not trivial so no attempt was made to reproduce the bug

@fulmicoton
Copy link
Collaborator

@andrebsguedes Thank you for the bug report... We will investigate.

@fulmicoton
Copy link
Collaborator

@PSeitz can you have a look at this?

@fulmicoton fulmicoton added the bug label May 12, 2021
@PSeitz PSeitz added the wip label May 14, 2021
PSeitz added a commit to PSeitz/tantivy that referenced this issue May 17, 2021
fixes a bug in log merge policy where an index was wrongly referenced by its index
PSeitz added a commit to PSeitz/tantivy that referenced this issue May 17, 2021
fixes a bug in log merge policy where an index was wrongly referenced by its index
PSeitz added a commit to PSeitz/tantivy that referenced this issue May 17, 2021
fixes a bug in log merge policy where an index was wrongly referenced by its index
PSeitz added a commit to PSeitz/tantivy that referenced this issue May 17, 2021
fixes a bug in log merge policy where an index was wrongly referenced by its index
@PSeitz
Copy link
Contributor

PSeitz commented May 17, 2021

Thanks for the report, should be fixed with #1035

PSeitz added a commit to PSeitz/tantivy that referenced this issue May 18, 2021
fixes a bug in log merge policy where an index was wrongly referenced by its index
This was referenced Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants