-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-4198: Initial commit for kserve perf metrics #396
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested some changes but it's hard for me to fully understand the changes based on the code alone. Can I see the generated output?
|
||
You can monitor the following metrics for a specific model that is deployed on the single-model serving platform: | ||
|
||
* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model. | |
* *Number of requests* - The number of requests that have failed or succeeded for a specific model. |
|
||
* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model. | ||
* *Average response time (ms)* - The average time it takes a specific model to respond to requests. | ||
* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model. | |
* *CPU utilization (%)* - The percentage of the model deployment's CPU's limit that is currently being used by a specific model. |
* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model. | ||
* *Average response time (ms)* - The average time it takes a specific model to respond to requests. | ||
* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model. | ||
* *Memory utilization (%)* - The percentage of the system's memory that is currently being used by a specific model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* *Memory utilization (%)* - The percentage of the system's memory that is currently being used by a specific model. | |
* *Memory utilization (%)* - The percentage of the model deployment's memory limit that is currently being used by a specific model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VedantMahabaleshwarkar What does model deployment mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pod if there's only 1 replica, pods if there are multiple replicas. But what I specifically mean is the Openshift Deployment
that represents the model. The deployment will have resource requests
and limits
. The % we show is the % relative to the limit
that is set in the deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VedantMahabaleshwarkar I think that model deployment might sound confusing - proposing an alternative, let me know your thoughts:
- CPU utilization (%): The percentage of the CPU limit per model replica that is currently utilized by a specific model.
- Memory utilization (%) - The percentage of memory limit per model replica that is currently utilized by a specific model.
Based my assumption on this setting:
Previews look good to me, just some minor suggestions as made previously |
@@ -10,7 +10,19 @@ You can view a graph that illustrates the HTTP requests that have failed or succ | |||
.Prerequisites | |||
* You have installed {productname-long}. | |||
* On the OpenShift cluster where {productname-short} is installed, user workload monitoring is enabled. | |||
* Your cluster administrator has _not_ edited the {productname-short} dashboard configuration to hide the *Endpoint Performance* tab on the *Model Serving* page. For more information, see link:{rhoaidocshome}/html/managing_resources/customizing-the-dashboard#ref-dashboard-configuration-options_dashboard[Dashboard configuration options]. | |||
* The following dashboard configuration options are set to their default values as shown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default values?
@@ -0,0 +1,72 @@ | |||
:_module-type: PROCEDURE | |||
|
|||
[id="viewing-performance-metrics-for-model-server_{context}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the title and id be the same?
ifdef::upstream[] | ||
* If you are using specialized {productname-short} groups, you are part of the user group or admin group (for example, {odh-user-group} or {odh-admin-group}) in OpenShift. | ||
endif::[] | ||
* The following dashboard configuration options are set to their default values as shown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default values
A few minor comments, but otherwise LGTM. |
b314463
to
9625679
Compare
Description
Initial commit for KServe performance metrics feature.
How Has This Been Tested?
Local build
Previews:
Monitoring Model Performance (Multi-model serving platform)
Monitoring Model Performance (Single-model serving platform)