-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi modal default preprocess function #2500
Add multi modal default preprocess function #2500
Conversation
Add IT to cover this ? |
@@ -46,6 +47,10 @@ public void validateTextDocsInput(MLInput mlInput) { | |||
if (!(mlInput.getInputDataset() instanceof TextDocsInputDataSet)) { | |||
throw new IllegalArgumentException("This pre_process_function can only support TextDocsInputDataSet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make error message more meaningful to the user, if possible please add details of how/what they need to change to fix it.
in addition to that we can log more details at warn
or info
level, like what's the actual class of input dataset object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It make sense we log details of input dataset type, but for changing the error message giving an TextDocsInputDataSet example doesn't seem make sense because the format of the error message will be odd. I prefer we give a link to the source code of TextDocsInputDataSet and user can learn the data structure in the code. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any tutorial / website link we can refer to instead of source code link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a good document describing this, I think it's fine to add the key data structure of this request to the error message. Will make the change soon.
@@ -46,6 +47,10 @@ public void validateTextDocsInput(MLInput mlInput) { | |||
if (!(mlInput.getInputDataset() instanceof TextDocsInputDataSet)) { | |||
throw new IllegalArgumentException("This pre_process_function can only support TextDocsInputDataSet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have context can you please add class and method level comments, somehow that is missing for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
import static org.opensearch.ml.common.utils.StringUtils.convertScriptStringToJsonString; | ||
|
||
public class MultiModalEmbeddingPreProcessFunction extends ConnectorPreProcessFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add class level comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you think of a better name for this class? We're only work with text and image embeddings in a certain format, plus parent class is about connector while this child class is for embedding processor. Maybe MultiModalModelConnectorPreProcessFunction
, or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent class name connector is a little bit redundant here since the pre process functions are specifically for connector use only. Also it's possible that in the future there'll be a pre process function of multi-modal connector not for embedding, but since the chance is low, I'll change this to the name you suggested.
validateTextDocsInput(mlInput); | ||
} | ||
|
||
// The input will must have inputText even it's null, input image is optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please correct wording for comment and format method level comment as per java conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...ensearch/ml/common/connector/functions/preprocess/MultiModalEmbeddingPreProcessFunction.java
Outdated
Show resolved
Hide resolved
TextDocsInputDataSet inputData = (TextDocsInputDataSet) mlInput.getInputDataset(); | ||
if (inputData.getDocs().size() == 1) { | ||
return RemoteInferenceInputDataSet.builder().parameters(convertScriptStringToJsonString(Map.of("parameters", Map.of("inputText", inputData.getDocs().get(0))))).build(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to check if docs collection is not empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From neural search code: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java#L287, the collection will never be empty. But to make code more adapt to change, will add check for empty docs.
...ensearch/ml/common/connector/functions/preprocess/MultiModalEmbeddingPreProcessFunction.java
Outdated
Show resolved
Hide resolved
|
||
import static org.junit.Assert.assertEquals; | ||
|
||
public class MultiModalEmbeddingPreProcessFunctionTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this repo all unit tests are derived from OpenSearchTestCase or one of it children, any reason why it's different here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all tests in this repo derived from OpenSearchTestCase, for tests in common
folder, they basically not derived from it.
} | ||
|
||
@Test | ||
public void process_NullInput() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use camelcase for method name, also please use following format when naming method: "itemBeingTested_scenario_expectedOutcome".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for all methods in this test class
bd1722d
to
aee2a0b
Compare
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
…ons/preprocess/MultiModalConnectorPreProcessFunction.java Co-authored-by: Yaliang Wu <ylwu@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
d6277ce
to
31fb324
Compare
* Add multi modal default preprocess function Signed-off-by: zane-neo <zaniu@amazon.com> * Address comments Signed-off-by: zane-neo <zaniu@amazon.com> * address comments Signed-off-by: zane-neo <zaniu@amazon.com> * add IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix IT Signed-off-by: zane-neo <zaniu@amazon.com> * Update common/src/main/java/org/opensearch/ml/common/connector/functions/preprocess/MultiModalConnectorPreProcessFunction.java Co-authored-by: Yaliang Wu <ylwu@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> * fix test Signed-off-by: Yaliang Wu <ylwu@amazon.com> * Add more ITs Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * fix failure IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * Add error response to make it esay to figure out the failure root cause Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * rebase main Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: Yaliang Wu <ylwu@amazon.com> Co-authored-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit 0e89c17)
common/src/main/java/org/opensearch/ml/common/connector/MLPreProcessFunction.java
Show resolved
Hide resolved
* Add multi modal default preprocess function Signed-off-by: zane-neo <zaniu@amazon.com> * Address comments Signed-off-by: zane-neo <zaniu@amazon.com> * address comments Signed-off-by: zane-neo <zaniu@amazon.com> * add IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix IT Signed-off-by: zane-neo <zaniu@amazon.com> * Update common/src/main/java/org/opensearch/ml/common/connector/functions/preprocess/MultiModalConnectorPreProcessFunction.java Co-authored-by: Yaliang Wu <ylwu@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> * fix test Signed-off-by: Yaliang Wu <ylwu@amazon.com> * Add more ITs Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * fix failure IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * Add error response to make it esay to figure out the failure root cause Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * rebase main Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: Yaliang Wu <ylwu@amazon.com> Co-authored-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit 0e89c17) Co-authored-by: zane-neo <zaniu@amazon.com>
* Add multi modal default preprocess function Signed-off-by: zane-neo <zaniu@amazon.com> * Address comments Signed-off-by: zane-neo <zaniu@amazon.com> * address comments Signed-off-by: zane-neo <zaniu@amazon.com> * add IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix IT Signed-off-by: zane-neo <zaniu@amazon.com> * Update common/src/main/java/org/opensearch/ml/common/connector/functions/preprocess/MultiModalConnectorPreProcessFunction.java Co-authored-by: Yaliang Wu <ylwu@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> * fix test Signed-off-by: Yaliang Wu <ylwu@amazon.com> * Add more ITs Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * fix failure IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * Add error response to make it esay to figure out the failure root cause Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * rebase main Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: Yaliang Wu <ylwu@amazon.com> Co-authored-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit 0e89c17)
* Add multi modal default preprocess function Signed-off-by: zane-neo <zaniu@amazon.com> * Address comments Signed-off-by: zane-neo <zaniu@amazon.com> * address comments Signed-off-by: zane-neo <zaniu@amazon.com> * add IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix IT Signed-off-by: zane-neo <zaniu@amazon.com> * Update common/src/main/java/org/opensearch/ml/common/connector/functions/preprocess/MultiModalConnectorPreProcessFunction.java Co-authored-by: Yaliang Wu <ylwu@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> * fix test Signed-off-by: Yaliang Wu <ylwu@amazon.com> * Add more ITs Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * fix failure IT Signed-off-by: zane-neo <zaniu@amazon.com> * Fix failure ITs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * Add error response to make it esay to figure out the failure root cause Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> * rebase main Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: Yaliang Wu <ylwu@amazon.com> Co-authored-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit 0e89c17) Co-authored-by: zane-neo <zaniu@amazon.com>
Description
Add multi modal default preprocess function
Issues Resolved
#2364
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.