-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose FlatVectorsFormat #13469
Expose FlatVectorsFormat #13469
Changes from 4 commits
02e9b07
b01800b
c5389c9
1c87c5d
66e8b34
6262835
6167d84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,7 +79,8 @@ public final class Lucene99FlatVectorsFormat extends FlatVectorsFormat { | |
private final FlatVectorsScorer vectorsScorer; | ||
|
||
/** Constructs a format */ | ||
public Lucene99FlatVectorsFormat(FlatVectorsScorer vectorsScorer) { | ||
public Lucene99FlatVectorsFormat(String name, FlatVectorsScorer vectorsScorer) { | ||
super(name); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder why for Seems like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. TBH I kind of just filled out the blanks with this one without thinking much about it, but I agree the name is really only useful for the SPI interface. |
||
this.vectorsScorer = vectorsScorer; | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,11 +36,13 @@ | |
import org.apache.lucene.index.SegmentReadState; | ||
import org.apache.lucene.index.VectorEncoding; | ||
import org.apache.lucene.index.VectorSimilarityFunction; | ||
import org.apache.lucene.search.KnnCollector; | ||
import org.apache.lucene.search.VectorScorer; | ||
import org.apache.lucene.store.ChecksumIndexInput; | ||
import org.apache.lucene.store.IOContext; | ||
import org.apache.lucene.store.IndexInput; | ||
import org.apache.lucene.store.ReadAdvice; | ||
import org.apache.lucene.util.Bits; | ||
import org.apache.lucene.util.IOUtils; | ||
import org.apache.lucene.util.RamUsageEstimator; | ||
import org.apache.lucene.util.hnsw.RandomVectorScorer; | ||
|
@@ -189,6 +191,18 @@ public ByteVectorValues getByteVectorValues(String field) throws IOException { | |
return rawVectorsReader.getByteVectorValues(field); | ||
} | ||
|
||
@Override | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, doing this means the Traditionally, query's were fairly ignorant of the underlying codec used to store the field. It does seem weird to have a codec that the Knn queries could be executed against throw an exception. I think either:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure how to do this other than having something that says "Supports approximate search" to the leaf readers. Or a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did toy with adding a "supportsSearch" flag when this problem surfaced due to BaseKnnVectorsFormatTestCase. Potentially we could return no results. I feel like this is the moral equivalent of a stored Field with IndexOptions.NONE (Like StoredField.TYPE)? But we have tended to not want to make such vector-representation choices a part of the Fields API, so it looks different. |
||
public void search(String field, float[] target, KnnCollector knnCollector, Bits acceptDocs) | ||
throws IOException { | ||
throw new UnsupportedOperationException(); | ||
} | ||
|
||
@Override | ||
public void search(String field, byte[] target, KnnCollector knnCollector, Bits acceptDocs) | ||
throws IOException { | ||
throw new UnsupportedOperationException(); | ||
} | ||
|
||
private static IndexInput openDataInput( | ||
SegmentReadState state, | ||
int versionMeta, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why a unique name here is required.
The top level format name is
HnswBitVectorsFormat
, and it doesn't really do anything to the inner format, it simply overrides the scoring mechanism.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the subject of the brute-force query I started to look at adding one and found we already have
o.a.l.queries.function.valuesource.VectorSimilarityFunction
-- I think maybe all we need here is a static convenience method that produces aFunctionQuery
wrapping one of these things although I'll confess I'm not entirely conversant with this API. It seems like it might want to be rewritten in terms ofVectorScorer
.