Add tests for validating recommendation json for accelerator values #1336

bharathappali · 2024-10-09T07:27:32Z

Description

This PR adds tests for the accelerator values in list recommendation json

Type of change

Bug fix
New feature
Docs update
Breaking change (What changes might users need to make in their application due to this PR?)
Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

New Test X
Functional testsuite

Test Configuration

Kubernetes clusters tested on:

Checklist 🎯

Additional information

Include any additional information such as links, test results, screenshots here

chandrams · 2024-10-11T06:26:28Z

tests/scripts/local_monitoring_tests/rest_apis/test_list_recommendations.py

+                             ("list_accelerator_recommendations", SUCCESS_STATUS_CODE, "v2.0", "human_eval_exp", "cluster-1", "resource-optimization-local-monitoring", "monitor", "local", "prometheus-1", "container", "statefulset", "human-eval-benchmark", "unpartitioned", None, None, "human-eval-benchmark", "15min", "0.1"),
+                         ]
+                         )


Please share the test results and update the test documentation with the test requirements.

This test is marked sanity so, it will run along with other kruize local tests on minikube and openshift. What is the behavior of this test when run on a non-GPU cluster?

What is the behavior of this test when run on a non-GPU cluster?

This test should work fine for non-gpu usecase as I'm checking for presence of nvidia in recommendations. I haven't checked it for normal workload. WIll check it and update the test results here.

@chandrams I have added the documentation as well. Can I please request your review? Thanks in advance!

chandrams · 2024-10-11T06:52:57Z

tests/scripts/local_monitoring_tests/rest_apis/test_list_recommendations.py

+    exp_name = input_json[0]['experiment_name']
+
+    response = generate_recommendations(exp_name)


Shouldn't we run benchmark load here before generating the recommendations?

This test is intended to run for a workload (typically a training or inference job) which is running for longer duration and has GPU usage.

bharathappali · 2024-10-15T06:36:02Z

report.txt

As Github doesn't allow the upload of html files I have renamed it to .txt ... Please download and change extension to .html and you can view the test report

bharathappali · 2024-10-15T09:35:31Z

@chandrams I have tested it on the Non-MIG GPU (Tesla T4) and the recommendations are as expected with no GPU recommendations and having only cpu and memory recommendations

Uploading the report as the text file
report-non-mig-gpu.txt

Signed-off-by: bharathappali <abharath@redhat.com>

chandrams · 2024-10-15T10:43:20Z

tests/scripts/local_monitoring_tests/Local_monitoring_tests.md

+
+#### Prerequisites to run the test:
+
+In addition to the pre-requisites mentioned above we need to make sure that a workload with name `human-eval-benchmark` is running in the namespace `unpartitioned` and has the accelerator usage data.


Please provide steps on how to run this workload or link to a doc that has the steps

It's one of our benchmarks human-eval-benchmark can I add link to it?

Added a link of human eval benchmark in the doc.

Signed-off-by: bharathappali <abharath@redhat.com>

chandrams

LGTM

bharathappali added this to the Kruize 0.1 Release milestone Oct 9, 2024

bharathappali requested review from dinogun and chandrams October 9, 2024 07:27

bharathappali self-assigned this Oct 9, 2024

chandrams reviewed Oct 11, 2024

View reviewed changes

rbadagandi1 added the remote_monitoring label Oct 15, 2024

bharathappali added 3 commits October 15, 2024 15:15

Add tests for validating recommendation json for accelerator values

c8e6e9e

Signed-off-by: bharathappali <abharath@redhat.com>

Modify Schema variable

43e2b78

Signed-off-by: bharathappali <abharath@redhat.com>

Add docs for the accelerator test

5cb0c0c

Signed-off-by: bharathappali <abharath@redhat.com>

bharathappali force-pushed the gpu-support-pr-10 branch from 4c413e8 to 5cb0c0c Compare October 15, 2024 09:46

chandrams reviewed Oct 15, 2024

View reviewed changes

add link for running the benchmark

2c55e0a

Signed-off-by: bharathappali <abharath@redhat.com>

chandrams approved these changes Oct 15, 2024

View reviewed changes

dinogun merged commit 08e92cb into kruize:mvp_demo Oct 15, 2024
2 of 3 checks passed

bharathappali mentioned this pull request Oct 15, 2024

GPU MIG Right sizing recommendations by kruize #1312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for validating recommendation json for accelerator values #1336

Add tests for validating recommendation json for accelerator values #1336

bharathappali commented Oct 9, 2024

chandrams Oct 11, 2024

chandrams Oct 11, 2024

bharathappali Oct 11, 2024 •

edited

Loading

bharathappali Oct 15, 2024

bharathappali Oct 15, 2024

chandrams Oct 11, 2024

bharathappali Oct 11, 2024

bharathappali commented Oct 15, 2024

bharathappali commented Oct 15, 2024

chandrams Oct 15, 2024

bharathappali Oct 15, 2024

bharathappali Oct 15, 2024

chandrams left a comment

		exp_name = input_json[0]['experiment_name']

		response = generate_recommendations(exp_name)


		#### Prerequisites to run the test:

		In addition to the pre-requisites mentioned above we need to make sure that a workload with name `human-eval-benchmark` is running in the namespace `unpartitioned` and has the accelerator usage data.

Add tests for validating recommendation json for accelerator values #1336

Add tests for validating recommendation json for accelerator values #1336

Conversation

bharathappali commented Oct 9, 2024

Description

Type of change

How has this been tested?

Checklist 🎯

Additional information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bharathappali Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bharathappali commented Oct 15, 2024

bharathappali commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chandrams left a comment

Choose a reason for hiding this comment

bharathappali Oct 11, 2024 •

edited

Loading