Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with base OpenSearch 2.0 #328

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Mar 21, 2022

Signed-off-by: Martin Gaievski gaievski@amazon.com

Description

Initial integration with base OpenSearch 2.0. Using Alpha1 version of OpenSearch 2.0 as it's the only version with support of Lucene 9.1, also building same "alpha1" artifacts for plugin.

Main changes:

  • switched to lucene 9.1
  • update mapping related logic
  • new codec versioning, please see details below

KNN needs new versioning schema.

Classes tied directly to Lucene version:

  • KNNXXCodec
  • KNNFormatFactory.createKNNXXFormat

Other classes use the delegate pattern so no direct tie to Lucene version:

  • BinaryDocValues
  • CompoundFormat
  • DocValuesConsumer
  • DocValuesReader

Proposal is to use 2 position X.X format, "LuceneVersion.FormatVersion".

With change in Lucene codec we change the first position in the version as it corresponds to Lucene version schema. E.g. with upgrade to OpenSearch 2.0 and Lucene 91 version changes from 870 to 910.

If any of four non-Lucene dependents changes (it can be something related to format change, like making the footprint on disk smaller) we increment the last position in the version, e.g. 910 to 911.

Proposal is to apply new versioning starting from Lucene 9.1, older existing classes remain as they are now. Version change reflected in KNNXXXCodec and KNNFormatFactory.createKNNXXXFormat classes, format related classes may remain with older/lower versions, e.g. KNNFormatFactory.createKNN910Format internally refer to KNN80DocValuesFormat and KNN80CompoundFormat.

Issues Resolved

No master issue yet, aiming 2.0 release

Check List

  • New functionality includes testing.
    • All tests pass
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski force-pushed the integrate/opensearch20-lucene91 branch from 67b8c0f to 7711d48 Compare March 21, 2022 21:45
@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2022

Codecov Report

Merging #328 (98f1264) into main (8203524) will increase coverage by 0.04%.
The diff coverage is 74.54%.

@@             Coverage Diff              @@
##               main     #328      +/-   ##
============================================
+ Coverage     83.77%   83.81%   +0.04%     
- Complexity      889      900      +11     
============================================
  Files           126      130       +4     
  Lines          3809     3844      +35     
  Branches        361      360       -1     
============================================
+ Hits           3191     3222      +31     
- Misses          455      459       +4     
  Partials        163      163              
Impacted Files Coverage Δ
...nsearch/knn/index/codec/KNN80Codec/KNN80Codec.java 30.00% <0.00%> (ø)
...nn/index/codec/KNN80Codec/KNN80CompoundFormat.java 100.00% <ø> (ø)
...n/index/codec/KNN80Codec/KNN80DocValuesFormat.java 100.00% <ø> (ø)
...nsearch/knn/index/codec/KNN87Codec/KNN87Codec.java 75.00% <ø> (-25.00%) ⬇️
...nsearch/knn/index/codec/KNN84Codec/KNN84Codec.java 30.00% <33.33%> (ø)
...nsearch/knn/index/codec/KNN86Codec/KNN86Codec.java 28.00% <33.33%> (ø)
...c/main/java/org/opensearch/knn/index/KNNQuery.java 83.33% <66.66%> (+5.55%) ⬆️
...rg/opensearch/knn/index/codec/KNNCodecFactory.java 71.42% <71.42%> (ø)
.../main/java/org/opensearch/knn/index/KNNWeight.java 78.94% <75.38%> (+3.60%) ⬆️
...g/opensearch/knn/index/codec/KNNFormatFactory.java 80.00% <80.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8203524...98f1264. Read the comment docs.

@jmazanec15
Copy link
Member

@martin-gaievski Does this take care of opensearch-project/opensearch-build#303?

@martin-gaievski
Copy link
Member Author

@martin-gaievski Does this take care of opensearch-project/opensearch-build#303?

I may fix some compilation/failing tests issues while working on this change, but I didn't target opensearch-project/opensearch-build#303 specifically. In scope of my PR I've fixed all integTests and BWC tests.

public final class KNN91Codec extends FilterCodec {

public static final String KNN_91 = "KNN91Codec";
private KNNDocFormatFacade docFormatFacade;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think a different name would be good other than KNNDocFormatFacade - maybe KNNFormatFacade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, KNNFormatFacade is more reasonable

* No arg constructor that uses Lucene91 as the delegate
*/
public KNN91Codec() {
this(new Lucene91Codec());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too much for the KNNDocFormatFactory to also have a createKNN91DefaultDelegate() static method? Or createDefaultDelegate(string version)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about that, my concern is that construction of delegate doesn't sound like a function of DocFormat factory. Maybe it's more suitable for a CodecFactory, maybe some inner factory class, like CodecFactory.DelegateFactory. Overall I would love to pull it out of the Codec class by we must have default constructor to be compliant with SPI spec

import org.apache.lucene.codecs.DocValuesFormat;
import org.opensearch.knn.index.codec.KNNDocFormatFacade;

public class KNN91DocFormat implements KNNDocFormatFacade {
Copy link
Member

@jmazanec15 jmazanec15 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I KNN91DocFormat a Format might be confusing. I generally think of formats as returnable classes from Codecs.

Further, I am wondering if KNN91DocFormat needs to exist. Why can't the Facade class just be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The facade class is an interface, my understanding is that for each lucene version we may have a different implementation. So that's why I created a FormatFactory that should return facade implementation for exact version of lucene. I was thinking of decoupling Codec class from format related classes. Ideally if we need to change the format slightly we'll change Format class, and in case of next lucene version we add new implementation of FormatFacade and also a method to a FormatFactory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the Facade takes a docValuesFormat and a compoundFormat. Why cant it be a class instead of an interface? KNN91DocFormat has no information in it specific to any version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely KNN91DocFormat or KNN91Format class can be used directly. I want to insert interface in between Format class and the codec/factory to have an extra level of abstraction and flexibility that polymorphism gives. For instance new format class in future has a clean contract to implement, it's easier to setup mocking for testing etc. Is the performance overhead the main concern or you have something else in mind, maybe class hierarchy became unclear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my concern is more around the class hierarchy. The name KNN91Format implies that it is tightly coupled with a particular knncodec/lucene version. However, it is not - it can be composed of any docvaluesformats or compoundformats.

So I think it should either strictly return docvaluesformats and compoundformats for this version or we should just use KNNDocFormatFacade as a class and handle composition in the factory. Are there any downsides with these approaches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that is valid concern. My vote is for making the Facade interface a class, deleting KNN91Format and moving the implementation logic from KNN91Format to KNNFormatFacade.

*/
// TODO: I think this can be refactored to avoid this copy and then write
// https://github.com/opendistro-for-elasticsearch/k-NN/issues/330
try (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to pull in changes on this.

@@ -27,6 +26,6 @@ public KNNCodecService(CodecServiceConfig codecServiceConfig) {
*/
@Override
public Codec codec(String name) {
return new KNN87Codec(super.codec(name));
return KNNCodecFactory.createKNN91Codec(super.codec(name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making 2 methods for KNNCodecFactory:

KNNCodecFactory.createKNNCodec(string knnCodecVersion, codec delegate)

// Just returns the latest
KNNCodecFactory.createKNNCodec(codec delegate)

That way, we could avoid having to change this class during upgrade.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will add this in next commit


CreateIndexRequest request = new CreateIndexRequest(MODEL_INDEX_NAME).mapping("_doc", getMapping(), XContentType.JSON)
String mapping = Strings.toString(
JsonXContent.contentBuilder().startObject().startObject(MapperService.SINGLE_MAPPING_NAME).endObject().endObject()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMapping contains the logic to load our model-index mapping. Why did this get deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will investigate and restore the original logic

@martin-gaievski martin-gaievski force-pushed the integrate/opensearch20-lucene91 branch from ae53d0d to af79c68 Compare March 22, 2022 21:57
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the integrate/opensearch20-lucene91 branch from af79c68 to e4c55fd Compare March 23, 2022 17:17
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the integrate/opensearch20-lucene91 branch from e4c55fd to ee411d5 Compare March 23, 2022 17:58
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski
Copy link
Member Author

martin-gaievski commented Mar 23, 2022

Created issue for core build project to address failing build #1817.
Closed issue above in favor of new core issue #2582

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the integrate/opensearch20-lucene91 branch from d390b57 to 98f1264 Compare March 24, 2022 19:29
@martin-gaievski martin-gaievski marked this pull request as ready for review March 28, 2022 16:01
@martin-gaievski martin-gaievski requested a review from a team March 28, 2022 16:01
@martin-gaievski martin-gaievski changed the title Integration with base OpenSearch 2.0 [PREVIEW] Integration with base OpenSearch 2.0 Mar 28, 2022
@martin-gaievski martin-gaievski added the Enhancements Increases software capabilities beyond original client specifications label Mar 28, 2022
throw new RuntimeException(e);
} finally {
indexAllocation.readUnlock();
SegmentReader reader = (SegmentReader) FilterLeafReader.unwrap(context.reader());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance type check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an old code, probably formatted by spotless. But I agree that the check is missing. Let me add it in next commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say dont add it in this PR. We can fix later. This was changed due to spotless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski I agree to @jmazanec15's comment. Shall we make one huge ( with caution ) spotless apply PR, so that it will not be a problem every-time we touch the code.

Copy link
Member Author

@martin-gaievski martin-gaievski Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's leave it and address in later PR. Please approve @VijayanB if that was the only comment from your side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VijayanB I set https://github.com/opensearch-project/k-NN/blob/main/gradle/formatting.gradle#L10 the ratchet from setting to apply spotless incrementally to avoid messing us the git history with a huge commit that changes everything. I have been thinking about removing this and doing a big sptless apply. We can probably discuss in an issue I can create.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@martin-gaievski martin-gaievski merged commit 6121c27 into opensearch-project:main Mar 28, 2022
@martin-gaievski martin-gaievski deleted the integrate/opensearch20-lucene91 branch March 28, 2022 21:44
@martin-gaievski martin-gaievski self-assigned this Mar 29, 2022
martin-gaievski added a commit to martin-gaievski/k-NN that referenced this pull request Mar 30, 2022
* Initial integration with OS 2.0 alpha1 snapshot

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications v2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants