Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for validating recommendation json for accelerator values #1336

Merged
merged 4 commits into from
Oct 15, 2024

Conversation

bharathappali
Copy link
Member

Description

This PR adds tests for the accelerator values in list recommendation json

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

  • New Test X
  • Functional testsuite

Test Configuration

  • Kubernetes clusters tested on:

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

@bharathappali bharathappali added this to the Kruize 0.1 Release milestone Oct 9, 2024
@bharathappali bharathappali self-assigned this Oct 9, 2024
("list_accelerator_recommendations", SUCCESS_STATUS_CODE, "v2.0", "human_eval_exp", "cluster-1", "resource-optimization-local-monitoring", "monitor", "local", "prometheus-1", "container", "statefulset", "human-eval-benchmark", "unpartitioned", None, None, "human-eval-benchmark", "15min", "0.1"),
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share the test results and update the test documentation with the test requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is marked sanity so, it will run along with other kruize local tests on minikube and openshift. What is the behavior of this test when run on a non-GPU cluster?

Copy link
Member Author

@bharathappali bharathappali Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior of this test when run on a non-GPU cluster?

This test should work fine for non-gpu usecase as I'm checking for presence of nvidia in recommendations. I haven't checked it for normal workload. WIll check it and update the test results here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2024-10-15 12-04-01

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandrams I have added the documentation as well. Can I please request your review? Thanks in advance!

exp_name = input_json[0]['experiment_name']

response = generate_recommendations(exp_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we run benchmark load here before generating the recommendations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intended to run for a workload (typically a training or inference job) which is running for longer duration and has GPU usage.

@bharathappali
Copy link
Member Author

report.txt

As Github doesn't allow the upload of html files I have renamed it to .txt ... Please download and change extension to .html and you can view the test report

@bharathappali
Copy link
Member Author

@chandrams I have tested it on the Non-MIG GPU (Tesla T4) and the recommendations are as expected with no GPU recommendations and having only cpu and memory recommendations

Uploading the report as the text file
report-non-mig-gpu.txt

Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>

#### Prerequisites to run the test:

In addition to the pre-requisites mentioned above we need to make sure that a workload with name `human-eval-benchmark` is running in the namespace `unpartitioned` and has the accelerator usage data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide steps on how to run this workload or link to a doc that has the steps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of our benchmarks human-eval-benchmark can I add link to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link of human eval benchmark in the doc.

Signed-off-by: bharathappali <abharath@redhat.com>
Copy link
Contributor

@chandrams chandrams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dinogun dinogun merged commit 08e92cb into kruize:mvp_demo Oct 15, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants