RHOAIENG-4198: Initial commit for kserve perf metrics #396

syaseen-rh · 2024-08-07T17:34:11Z

Description

Initial commit for KServe performance metrics feature.

Adding a procedure for viewing kserve metrics on the single-model serving platform

How Has This Been Tested?

Local build

Previews:

Monitoring Model Performance (Multi-model serving platform)

Monitoring Model Performance (Single-model serving platform)

VedantMahabaleshwarkar

suggested some changes but it's hard for me to fully understand the changes based on the code alone. Can I see the generated output?

VedantMahabaleshwarkar · 2024-08-07T17:59:18Z

modules/viewing-performance-metrics-for-deployed-model.adoc

+
+You can monitor the following metrics for a specific model that is deployed on the single-model serving platform:
+
+* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model.


Suggested change

* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model.

* *Number of requests* - The number of requests that have failed or succeeded for a specific model.

VedantMahabaleshwarkar · 2024-08-07T18:01:09Z

modules/viewing-performance-metrics-for-deployed-model.adoc

+
+* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model.
+* *Average response time (ms)* - The average time it takes a specific model to respond to requests.
+* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model.


Suggested change

* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model.

* *CPU utilization (%)* - The percentage of the model deployment's CPU's limit that is currently being used by a specific model.

VedantMahabaleshwarkar · 2024-08-07T18:01:22Z

modules/viewing-performance-metrics-for-deployed-model.adoc

+* *Number of requests* - The number of HTTP requests that have failed or succeeded for a specific model.
+* *Average response time (ms)* - The average time it takes a specific model to respond to requests.
+* *CPU utilization (%)* - The percentage of the CPU's capacity that is currently being used by a specific model.
+* *Memory utilization (%)* - The percentage of the system's memory that is currently being used by a specific model.


Suggested change

* *Memory utilization (%)* - The percentage of the system's memory that is currently being used by a specific model.

* *Memory utilization (%)* - The percentage of the model deployment's memory limit that is currently being used by a specific model.

@VedantMahabaleshwarkar What does model deployment mean here?

Pod if there's only 1 replica, pods if there are multiple replicas. But what I specifically mean is the Openshift Deployment that represents the model. The deployment will have resource requests and limits. The % we show is the % relative to the limit that is set in the deployment

@VedantMahabaleshwarkar I think that model deployment might sound confusing - proposing an alternative, let me know your thoughts:

CPU utilization (%): The percentage of the CPU limit per model replica that is currently utilized by a specific model.

Memory utilization (%) - The percentage of memory limit per model replica that is currently utilized by a specific model.

Based my assumption on this setting:

VedantMahabaleshwarkar · 2024-08-07T20:07:47Z

Previews look good to me, just some minor suggestions as made previously

aduquett · 2024-08-08T12:19:59Z

modules/viewing-http-request-metrics-for-a-deployed-model.adoc

@@ -10,7 +10,19 @@ You can view a graph that illustrates the HTTP requests that have failed or succ
 .Prerequisites
 * You have installed {productname-long}.
 * On the OpenShift cluster where {productname-short} is installed, user workload monitoring is enabled.
-* Your cluster administrator has _not_ edited the {productname-short} dashboard configuration to hide the *Endpoint Performance* tab on the *Model Serving* page. For more information, see link:{rhoaidocshome}/html/managing_resources/customizing-the-dashboard#ref-dashboard-configuration-options_dashboard[Dashboard configuration options].
+* The following dashboard configuration options are set to their default values as shown:


the default values?

aduquett · 2024-08-08T12:26:00Z

modules/viewing-performance-metrics-for-deployed-model.adoc

@@ -0,0 +1,72 @@
+:_module-type: PROCEDURE
+
+[id="viewing-performance-metrics-for-model-server_{context}"]


Should the title and id be the same?

aduquett · 2024-08-08T12:30:20Z

modules/viewing-performance-metrics-for-deployed-model.adoc

+ifdef::upstream[]
+* If you are using specialized {productname-short} groups, you are part of the user group or admin group (for example, {odh-user-group} or {odh-admin-group}) in OpenShift.
+endif::[]
+* The following dashboard configuration options are set to their default values as shown:


the default values

aduquett · 2024-08-08T12:37:22Z

A few minor comments, but otherwise LGTM.

syaseen-rh requested a review from VedantMahabaleshwarkar August 7, 2024 17:42

VedantMahabaleshwarkar reviewed Aug 7, 2024

View reviewed changes

aduquett reviewed Aug 8, 2024

View reviewed changes

syaseen-rh added 2 commits August 9, 2024 12:07

initial commit for kserve perf metrics

14dafa7

addressing review comments

9625679

syaseen-rh force-pushed the RHOAIENG-4198 branch from b314463 to 9625679 Compare August 9, 2024 16:07

syaseen-rh merged commit 3df813f into opendatahub-io:main Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHOAIENG-4198: Initial commit for kserve perf metrics #396

RHOAIENG-4198: Initial commit for kserve perf metrics #396

syaseen-rh commented Aug 7, 2024 •

edited

Loading

VedantMahabaleshwarkar left a comment

VedantMahabaleshwarkar Aug 7, 2024

VedantMahabaleshwarkar Aug 7, 2024

VedantMahabaleshwarkar Aug 7, 2024

syaseen-rh Aug 7, 2024

VedantMahabaleshwarkar Aug 7, 2024

syaseen-rh Aug 8, 2024

VedantMahabaleshwarkar commented Aug 7, 2024

aduquett Aug 8, 2024

aduquett Aug 8, 2024

aduquett Aug 8, 2024

aduquett commented Aug 8, 2024


		You can monitor the following metrics for a specific model that is deployed on the single-model serving platform:

		* Number of requests - The number of HTTP requests that have failed or succeeded for a specific model.

	* Number of requests - The number of HTTP requests that have failed or succeeded for a specific model.
	* Number of requests - The number of requests that have failed or succeeded for a specific model.

	* CPU utilization (%) - The percentage of the CPU's capacity that is currently being used by a specific model.
	* CPU utilization (%) - The percentage of the model deployment's CPU's limit that is currently being used by a specific model.

	* Memory utilization (%) - The percentage of the system's memory that is currently being used by a specific model.
	* Memory utilization (%) - The percentage of the model deployment's memory limit that is currently being used by a specific model.

		@@ -0,0 +1,72 @@
		:_module-type: PROCEDURE

		[id="viewing-performance-metrics-for-model-server_{context}"]

RHOAIENG-4198: Initial commit for kserve perf metrics #396

RHOAIENG-4198: Initial commit for kserve perf metrics #396

Conversation

syaseen-rh commented Aug 7, 2024 • edited Loading

Description

How Has This Been Tested?

Previews:

Monitoring Model Performance (Multi-model serving platform)

Monitoring Model Performance (Single-model serving platform)

VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VedantMahabaleshwarkar commented Aug 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aduquett commented Aug 8, 2024

syaseen-rh commented Aug 7, 2024 •

edited

Loading