Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dedicated ml node #79

Closed
spbjss opened this issue Sep 30, 2021 · 0 comments · Fixed by #346
Closed

Support dedicated ml node #79

spbjss opened this issue Sep 30, 2021 · 0 comments · Fixed by #346
Labels
enhancement New feature or request v2.1.0

Comments

@spbjss
Copy link
Contributor

spbjss commented Sep 30, 2021

Is your feature request related to a problem?
We released ml-commons plugin in OpenSearch 1.3. It supports training model and predicting. ML model generally consuming more resources, especially for training process. The community wants to support bigger ML models which might require more resources and special hardware like GPU.

As OpenSearch doesn’t support ML node, we dispatch ML task to data node only. That means if user wants to train a large model, they need to scale up all data nodes which can be costly. And ML tasks will use shared resources on data nodes which may impact the core searching/indexing function.

What solution would you like?
Support a dedicated ML node, users don’t need to scale up their data node at all. Instead just configure a new ML node (with different settings, more powerful instance type) and add it to cluster via the YAML file (requires a cluster restart). By doing so, users can separate resource usage better by running ML task on dedicated node which can reduce impact to other critical tasks like search/ingestion.

OpenSearch core will check node role when start node. If role is not built-in roles like data role, it will throw exception and node can't start. To support dedicated ML node, we have to remove this limitation in OpenSearch core. That is done with this PR which supports dynamic node role in OpenSearch opensearch-project/OpenSearch#3436.

With that we can enhance ml-commons code to dispatch task to ml nodes first. If no ml nodes we can fall back to data nodes.

Do you have any additional context?
Original Proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants