Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Warn the user when the model memory limit is higher than the memory available in the ML node #63942

Closed
romain-chanu opened this issue Apr 20, 2020 · 2 comments · Fixed by #65652
Assignees
Labels
enhancement New value added to drive a business result :ml v7.8.0

Comments

@romain-chanu
Copy link

Describe the feature: As a user, I can configure an anomaly detection job with a model memory limit (model_memory_limit) higher than the memory available in the ML node. The memory available is currently associated with max_machine_memory_percent (by default 30% of the total memory of the machine). The model memory limit could also be bound by max_model_memory_limit.

For example: given a ML node with 16 GB and max_machine_memory_percent set to 30%, the available memory will be 4.8 GB. Saving an anomaly detection job with model_memory_limit set to 6 GB results in no warning.

Describe a specific use case for the feature: users should be warned / informed if an anomaly detection job configuration (e.g model memory limit) exceeds the ML node memory capacity / configuration.

@romain-chanu romain-chanu added enhancement New value added to drive a business result :ml labels Apr 20, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Apr 21, 2020
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.current_effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Relates elastic/kibana#63942
@droberts195
Copy link
Contributor

The backend support for this change is elastic/elasticsearch#55529

droberts195 added a commit to elastic/elasticsearch that referenced this issue Apr 22, 2020
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Relates elastic/kibana#63942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result :ml v7.8.0
Projects
None yet
5 participants