Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding executeAPI #165

Merged
merged 9 commits into from
Jun 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/reference/api/ml_commons_execute_api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Execute
==================

.. currentmodule:: opensearch_py_ml

.. autofunction:: opensearch_py_ml.ml_commons.MLCommonClient.execute
6 changes: 6 additions & 0 deletions docs/source/reference/mlcommons.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,9 @@ Delete Task

api/ml_commons_delete_task_api

Execute
~~~~~~~~~~~~~~~~~
.. toctree::
:maxdepth: 2

api/ml_commons_execute_api
17 changes: 17 additions & 0 deletions opensearch_py_ml/ml_commons/ml_commons_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from opensearchpy import OpenSearch

from opensearch_py_ml.ml_commons.ml_common_utils import ML_BASE_URI, TIMEOUT
from opensearch_py_ml.ml_commons.model_execute import ModelExecute
from opensearch_py_ml.ml_commons.model_uploader import ModelUploader


Expand All @@ -25,6 +26,22 @@ class MLCommonClient:
def __init__(self, os_client: OpenSearch):
self._client = os_client
self._model_uploader = ModelUploader(os_client)
self._model_execute = ModelExecute(os_client)

def execute(self, algorithm_name: str, input_json: dict) -> dict:
"""
This method executes ML algorithms that can be only executed directly (i.e. do not support train and
predict APIs), like anomaly localization and metrics correlation. The algorithm has to be supported by ML Commons.
Refer to https://opensearch.org/docs/2.7/ml-commons-plugin/api/#execute

:param algorithm_name: Name of the algorithm
:type algorithm_name: string
:param input_json: Dictionary of parameters
:type input_json: dict
:return: returns the API response
:rtype: dict
"""
return self._model_execute._execute(algorithm_name, input_json)

@deprecated(
reason="Since OpenSearch 2.7.0, you can use register_model instead",
Expand Down
48 changes: 48 additions & 0 deletions opensearch_py_ml/ml_commons/model_execute.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.

import json

from opensearchpy import OpenSearch

from opensearch_py_ml.ml_commons.ml_common_utils import ML_BASE_URI


class ModelExecute:
"""
Class for executing algorithms using ML Commons execute API.
"""

API_ENDPOINT = "models/_execute"

def __init__(self, os_client: OpenSearch):
self._client = os_client

def _execute(self, algorithm_name: str, input_json: dict) -> dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all I think we should expect a string for input_json not a dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Wouldn’t that be inconvenient for the user?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if you can see in the post requests customers are used to send these values as string into the api. I believe this will keep the experience same.

For dict, customer eventually need to build the dict.

May be we can expect both (string and dict) if you think that can improve customer experience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, it accepts both dict and string. Should I change anything?

"""
This method executes ML algorithms that can be only executed directly (i.e. do not support train and
predict APIs), like anomaly localization and metrics correlation.
The input json must be a dictionary or a deserializable Python object.
The algorithm has to be supported by ML Commons.
Refer to https://opensearch.org/docs/2.7/ml-commons-plugin/api/#execute

:param algorithm_name: Name of the algorithm
:type algorithm_name: string
:param input_json: Dictionary of parameters or a deserializable string, byte, or bytearray
:type input_json: dict
:return: returns the API response
:rtype: dict
"""

if not isinstance(input_json, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import json

def is_valid_json(input_string):
    try:
        json.loads(input_string)
        return True
    except ValueError:
        return False

Can't we do something like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return false if the input is dictionary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean if the input is {"a":"abc"} then it will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. json.loads accepts only string or bytearray

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked using this function:

Provided input string is valid:  {"a":"abc"}
Provided input string is valid:  {"operation": "max", "input_data": [1.0, 2.0, 3.0]}
Provided input string is not valid:  zaa
Provided input string is valid:  {"index_name": "rca-index","attribute_field_names": ["attribute"],"aggregations": [{"sum": {"sum": {"field": "value"}}}],"time_field_name": "timestamp","start_time": 1620630000000,"end_time": 1621234800000,"min_time_interval": 86400000,"num_outputs": 10}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code snippet you provided (if not isinstance(input_json, dict):) does not directly check whether a given input is a valid JSON string or not. The isinstance() function in Python checks if an object is an instance of a particular class or data type. In this case, it checks if the input_json variable is not an instance of a dictionary (dict).

While a valid JSON string can be represented as a dictionary object in Python, this check alone does not guarantee that the input is a valid JSON string. It only verifies that the input_json is not a dictionary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I will double check whether it accepts dictionaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is not an instance of a dictionary, json.loads will be called. Then, if it’s not a deserializable json, it will raise an exception. Otherwise, it returns a dictionary that is used in the query.

input_json = json.loads(input_json)

return self._client.transport.perform_request(
method="POST",
url=f"{ML_BASE_URI}/_execute/{algorithm_name}",
body=input_json,
)
31 changes: 31 additions & 0 deletions tests/ml_commons/test_ml_commons_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,34 @@ def test_init():
assert type(ml_client._model_uploader) == ModelUploader


def test_execute():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add another test to show it works with string also?

raised = False
try:
input_json = {"operation": "max", "input_data": [1.0, 2.0, 3.0]}
result = ml_client.execute(
algorithm_name="local_sample_calculator", input_json=input_json
)
assert result["output"]["result"] == 3
except: # noqa: E722
raised = True
assert (
raised == False
), "Raised Exception during execute API testing with dictionary"

raised = False
try:
input_json = '{"operation": "max", "input_data": [1.0, 2.0, 3.0]}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, we should get result from this input. In the try block can we also match value? In both cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we try to cover the catch condition. I'm planning to add start a initiative to start covering for the except cases. This can be good start. Pls lmk if you have any question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on the covering except part? What do you men by cover?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, ignore my last comment. Approving the PR.

result = ml_client.execute(
algorithm_name="local_sample_calculator", input_json=input_json
)
assert result["output"]["result"] == 3
except: # noqa: E722
raised = True
assert (
raised == False
), "Raised Exception during execute API testing with JSON string"


def test_DEPRECATED_integration_pretrained_model_upload_unload_delete():
raised = False
try:
Expand Down Expand Up @@ -381,3 +409,6 @@ def test_integration_model_train_register_full_cycle():
except: # noqa: E722
raised = True
assert raised == False, "Raised Exception in deleting model"


test_integration_model_train_register_full_cycle()