-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with base OpenSearch 2.0 #328
Integration with base OpenSearch 2.0 #328
Conversation
67b8c0f
to
7711d48
Compare
Codecov Report
@@ Coverage Diff @@
## main #328 +/- ##
============================================
+ Coverage 83.77% 83.81% +0.04%
- Complexity 889 900 +11
============================================
Files 126 130 +4
Lines 3809 3844 +35
Branches 361 360 -1
============================================
+ Hits 3191 3222 +31
- Misses 455 459 +4
Partials 163 163
Continue to review full report at Codecov.
|
@martin-gaievski Does this take care of opensearch-project/opensearch-build#303? |
I may fix some compilation/failing tests issues while working on this change, but I didn't target opensearch-project/opensearch-build#303 specifically. In scope of my PR I've fixed all integTests and BWC tests. |
src/main/java/org/opensearch/knn/index/codec/KNN91Codec/KNN91Codec.java
Outdated
Show resolved
Hide resolved
public final class KNN91Codec extends FilterCodec { | ||
|
||
public static final String KNN_91 = "KNN91Codec"; | ||
private KNNDocFormatFacade docFormatFacade; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think a different name would be good other than KNNDocFormatFacade - maybe KNNFormatFacade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, KNNFormatFacade is more reasonable
* No arg constructor that uses Lucene91 as the delegate | ||
*/ | ||
public KNN91Codec() { | ||
this(new Lucene91Codec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it too much for the KNNDocFormatFactory to also have a createKNN91DefaultDelegate()
static method? Or createDefaultDelegate(string version)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about that, my concern is that construction of delegate doesn't sound like a function of DocFormat factory. Maybe it's more suitable for a CodecFactory, maybe some inner factory class, like CodecFactory.DelegateFactory. Overall I would love to pull it out of the Codec class by we must have default constructor to be compliant with SPI spec
import org.apache.lucene.codecs.DocValuesFormat; | ||
import org.opensearch.knn.index.codec.KNNDocFormatFacade; | ||
|
||
public class KNN91DocFormat implements KNNDocFormatFacade { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I KNN91DocFormat a Format might be confusing. I generally think of formats as returnable classes from Codecs.
Further, I am wondering if KNN91DocFormat needs to exist. Why can't the Facade class just be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The facade class is an interface, my understanding is that for each lucene version we may have a different implementation. So that's why I created a FormatFactory that should return facade implementation for exact version of lucene. I was thinking of decoupling Codec class from format related classes. Ideally if we need to change the format slightly we'll change Format class, and in case of next lucene version we add new implementation of FormatFacade and also a method to a FormatFactory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the Facade takes a docValuesFormat and a compoundFormat. Why cant it be a class instead of an interface? KNN91DocFormat has no information in it specific to any version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely KNN91DocFormat or KNN91Format class can be used directly. I want to insert interface in between Format class and the codec/factory to have an extra level of abstraction and flexibility that polymorphism gives. For instance new format class in future has a clean contract to implement, it's easier to setup mocking for testing etc. Is the performance overhead the main concern or you have something else in mind, maybe class hierarchy became unclear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my concern is more around the class hierarchy. The name KNN91Format implies that it is tightly coupled with a particular knncodec/lucene version. However, it is not - it can be composed of any docvaluesformats or compoundformats.
So I think it should either strictly return docvaluesformats and compoundformats for this version or we should just use KNNDocFormatFacade as a class and handle composition in the factory. Are there any downsides with these approaches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is valid concern. My vote is for making the Facade interface a class, deleting KNN91Format and moving the implementation logic from KNN91Format to KNNFormatFacade.
*/ | ||
// TODO: I think this can be refactored to avoid this copy and then write | ||
// https://github.com/opendistro-for-elasticsearch/k-NN/issues/330 | ||
try ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to pull in changes on this.
@@ -27,6 +26,6 @@ public KNNCodecService(CodecServiceConfig codecServiceConfig) { | |||
*/ | |||
@Override | |||
public Codec codec(String name) { | |||
return new KNN87Codec(super.codec(name)); | |||
return KNNCodecFactory.createKNN91Codec(super.codec(name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider making 2 methods for KNNCodecFactory:
KNNCodecFactory.createKNNCodec(string knnCodecVersion, codec delegate)
// Just returns the latest
KNNCodecFactory.createKNNCodec(codec delegate)
That way, we could avoid having to change this class during upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will add this in next commit
|
||
CreateIndexRequest request = new CreateIndexRequest(MODEL_INDEX_NAME).mapping("_doc", getMapping(), XContentType.JSON) | ||
String mapping = Strings.toString( | ||
JsonXContent.contentBuilder().startObject().startObject(MapperService.SINGLE_MAPPING_NAME).endObject().endObject() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getMapping contains the logic to load our model-index mapping. Why did this get deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, will investigate and restore the original logic
ae53d0d
to
af79c68
Compare
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
af79c68
to
e4c55fd
Compare
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
e4c55fd
to
ee411d5
Compare
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
d390b57
to
98f1264
Compare
src/main/java/org/opensearch/knn/index/codec/KNN910Codec/KNN910Codec.java
Show resolved
Hide resolved
throw new RuntimeException(e); | ||
} finally { | ||
indexAllocation.readUnlock(); | ||
SegmentReader reader = (SegmentReader) FilterLeafReader.unwrap(context.reader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance type check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an old code, probably formatted by spotless. But I agree that the check is missing. Let me add it in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say dont add it in this PR. We can fix later. This was changed due to spotless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski I agree to @jmazanec15's comment. Shall we make one huge ( with caution ) spotless apply PR, so that it will not be a problem every-time we touch the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it and address in later PR. Please approve @VijayanB if that was the only comment from your side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VijayanB I set https://github.com/opensearch-project/k-NN/blob/main/gradle/formatting.gradle#L10 the ratchet from setting to apply spotless incrementally to avoid messing us the git history with a huge commit that changes everything. I have been thinking about removing this and doing a big sptless apply. We can probably discuss in an issue I can create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
* Initial integration with OS 2.0 alpha1 snapshot Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski gaievski@amazon.com
Description
Initial integration with base OpenSearch 2.0. Using Alpha1 version of OpenSearch 2.0 as it's the only version with support of Lucene 9.1, also building same "alpha1" artifacts for plugin.
Main changes:
KNN needs new versioning schema.
Classes tied directly to Lucene version:
Other classes use the delegate pattern so no direct tie to Lucene version:
Proposal is to use 2 position X.X format, "LuceneVersion.FormatVersion".
With change in Lucene codec we change the first position in the version as it corresponds to Lucene version schema. E.g. with upgrade to OpenSearch 2.0 and Lucene 91 version changes from 870 to 910.
If any of four non-Lucene dependents changes (it can be something related to format change, like making the footprint on disk smaller) we increment the last position in the version, e.g. 910 to 911.
Proposal is to apply new versioning starting from Lucene 9.1, older existing classes remain as they are now. Version change reflected in KNNXXXCodec and KNNFormatFactory.createKNNXXXFormat classes, format related classes may remain with older/lower versions, e.g. KNNFormatFactory.createKNN910Format internally refer to KNN80DocValuesFormat and KNN80CompoundFormat.
Issues Resolved
No master issue yet, aiming 2.0 release
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.