Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per model metrics #40

Conversation

VedantMahabaleshwarkar
Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar commented Oct 11, 2023

for step 2 of opendatahub-io/modelmesh-serving#228

Not sure why the unit tests from Openshift CI are failing on the PR. I manually verified that all tests are successful. To replicate, check out the PR branch VedantMahabaleshwarkar:metrics_cherrypick and run mvn -B package --file pom.xml to verify all unit tests pass.

TESTING INSTRUCTIONS :

  • Install ODH Operator 1.10.1 from operatorhub
  • Create the following kfdef
  • deploy a sample modelmesh model in any Data Science Project
  • Run some inference requests against the model
  • Navigate to Openshift Console -> Observe -> Metrics and run the following query and verify it returns the expected data
    • sum(increase(modelmesh_api_request_milliseconds_count{vModelId='example-onnx-mnist'}[1m]))
    • Note : Change vModelId=<your-model-name>

Ensure the query is visible without elevated permissions :

  • Create a regular user with edit and monitoring-rules-view permissions over the model namespace
  • Log in to the cluster using the regular user
  • Openshift Console -> Developer view -> Observe -> Metrics -> Custom Query -> Run sum(increase(modelmesh_api_request_milliseconds_count{vModelId='example-onnx-mnist'}[1m])) and verify it returns the expected data

- Add `modelId` parameter to `logTimingMetricDuration` function in `Metrics.java`:
  - `modelmesh_cache_miss_milliseconds`
  - `modelmesh_loadmodel_milliseconds`
  - `modelmesh_unloadmodel_milliseconds`
  - `modelmesh_req_queue_delay_milliseconds`
  - `modelmesh_model_sizing_milliseconds`
  - `modelmesh_age_at_eviction_milliseconds`
- Add `modelId` parameter to `logSizeEventMetric` function in `Metrics.java`:
  - `modelmesh_loading_queue_delay_milliseconds`
  - `modelmesh_loaded_model_size_bytes`
- Add `modelId` and `vModelId` param to `logRequestMetrics` in `Metrics.java`:
  - `modelmesh_invoke_model_milliseconds`
  - `modelmesh_api_request_milliseconds`

Closes red-hat-data-services#60

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Co-authored-by: Daniele Zonca <dzonca@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
@VedantMahabaleshwarkar
Copy link
Author

/retest

}

@Test
@BeforeAll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is intended to be executed before the suite, it would be good to rename the method to something more accurate, e.g. prepareMetricsEnv

but the method itself, seems like a test scenario, not something to be executed before.

public void logTimingMetricDuration(Metric metric, long elapsed, boolean isNano, String modelId) {
Histogram histogram = (Histogram) metricsMap.get(metric);
if (perModelMetricsEnabled && modelId != null) {
histogram.labels(modelId, "").observe(isNano ? elapsed / M : elapsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a description for this label.


String methodName = idx == -1 ? name : name.substring(idx + 1);
if (perModelMetricsEnabled) {
modelId = Strings.nullToEmpty(modelId);
Copy link
Member

@spolti spolti Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2 cents:
Here I would use Optional or Objects.toString(env, $desiredValue).
It will avoid using external libraries.

Copy link
Member

@spolti spolti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few comments, not mandatory though.
Thanks.

@Jooho
Copy link

Jooho commented Oct 11, 2023

/retest

Copy link

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Oct 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jooho, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Jooho,VedantMahabaleshwarkar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit 2190b2e into opendatahub-io:release-0.11 Oct 11, 2023
@heyselbi heyselbi linked an issue Oct 12, 2023 that may be closed by this pull request
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade MM to v0.11.0 in RHODS + Metrics hotfix
3 participants