-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write MSB VLong for better outputs sharing in block tree index #12631
Conversation
With this change, The total size of
|
WHOA, wow! This is a massive gain for such a tiny change :) I'll try to review soon! Nice to revisit ancient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! What a small change yielding such a big win!
We should watch the nightly luceneutil
benchy PKLookup
task to see any CPU performance impact from this change, or run the benchmark before pushing (I can help run it).
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java
Outdated
Show resolved
Hide resolved
.../core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
Outdated
Show resolved
Hide resolved
.../core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
Outdated
Show resolved
Hide resolved
/** Current terms format. */ | ||
public static final int VERSION_CURRENT = VERSION_START; | ||
public static final int VERSION_CURRENT = VERSION_MSB_VLONG_OUTPUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining what changed, and link to the GitHub issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some comments on VERSION_MSB_VLONG_OUTPUT
. I prefer to leave comments there, as VERSION_CURRENT
could probably link to other version in feature. WDYT?
Issue added :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, yes - VERSION_MSB_VLONG_OUTPUT
is the right place for the comment! Thanks.
int LSBVLongBytes = (Long.SIZE - Long.numberOfLeadingZeros(i) - 1) / 7 + 1; | ||
i <<= Long.SIZE - LSBVLongBytes * 7; | ||
for (int j = 1; j < LSBVLongBytes; j++) { | ||
scratchBytes.writeByte((byte) (((i >>> 57) & 0x7FL) | 0x80)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @uschindler can help review this heavy math :)
I kicked off a |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small comments! Thanks @gf2121!
@@ -460,6 +460,19 @@ public String toString() { | |||
return "BLOCK: prefix=" + brToString(prefix); | |||
} | |||
|
|||
private static void writeMSBVLong(long l, DataOutput scratchBytes) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please add a randomized test that asserts that randomly encoding & decoding (non-negative?) long values always returns the same long / does not fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you add a comment pointing to the corresponding read method matching this write method? E.g. /** Encodes long value to variable length byte[], in MSB order. Use FieldReader#readMSBVlong to decode. */
} | ||
} | ||
|
||
private static long readMSBVLong(DataInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a private javadoc linking to the write method? E.g.
/** Decodes a variable length byte[] in MSB order back to long, as written by Lucene90BlockTreeTermsWriter.writeMSBVLong. Use FieldReader#readMSBVlong to decode. */
lucene/core/src/test/org/apache/lucene/codecs/lucene90/blocktree/TestMSBVLong.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Adrien Grand <jpountz@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gf2121! Looks great :)
@jpountz @mikemccand Thanks a lot for the great suggestions and benchmark ! |
closes #12620