Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support train ML model in either sync or async way #124

Merged
merged 1 commit into from
Jan 21, 2022

Conversation

ylwu-amzn
Copy link
Collaborator

@ylwu-amzn ylwu-amzn commented Jan 19, 2022

Signed-off-by: Yaliang Wu ylwu@amazon.com

Description

Support training ML model in either sync or async way.
For example, PPL user should be able to train ML model in sync way; and user can use async way if the model training will be time consuming.

Main changes:

  1. Support async URL parameter in train API. Add ?async=true if need to train model in async way, by default will train model in sync way.
  2. If train model in sync way, won't persist task in index.
  3. For sync way, will return both model id and task id. For async way, will just return task id and user need to poll task to know its state/progress.
  4. Use GetRequest to get model in predict task runner

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ylwu-amzn ylwu-amzn requested a review from a team January 19, 2022 10:01
Signed-off-by: Yaliang Wu <ylwu@amazon.com>
@@ -63,6 +65,7 @@
@Setter
private String error;
private User user; // TODO: support document level access control later
private boolean async;

@Builder
public MLTask(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider moving this class to the ml-common in the next PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will send out a separate refactoring PR

Comment on lines +119 to +123
if (request.isAsync()) {
mlTaskManager.createMLTask(mlTask, ActionListener.wrap(r -> {
String taskId = r.getId();
mlTask.setTaskId(taskId);
if (mlTask.isAsync()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any user case that the request is Sync, but the mlTask is Async? I am just wondering the difference between the two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async property of MLTask is consistent with request, check line 116 of this class.
For async task, we will cache task in memory and persist task in index and it will return task id directly.
For sync task, we just cache task in memory. Will train model and return the model id.

Copy link
Collaborator

@Zhangxunmt Zhangxunmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a few questions in the comments.

@ylwu-amzn ylwu-amzn merged commit 1d5da1d into opensearch-project:main Jan 21, 2022
@ylwu-amzn ylwu-amzn mentioned this pull request Mar 9, 2022
@ylwu-amzn ylwu-amzn added enhancement New feature or request feature and removed enhancement New feature or request labels Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants