Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test of visualization tool #2416

Merged
merged 4 commits into from
May 9, 2024

Conversation

Hailong-am
Copy link
Contributor

Description

This PR is add waiting for model to be undeployed to fix flaky test introduced by #2363,

https://github.com/opensearch-project/ml-commons/actions/runs/8933312245/job/24539185907?pr=2402

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Hailong Cui <ihailong@amazon.com>
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Failure
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 00:29 — with GitHub Actions Failure
@Hailong-am Hailong-am changed the title Fix flaky test of visualization tool [Test] Fix flaky test of visualization tool May 8, 2024
@Hailong-am Hailong-am changed the title [Test] Fix flaky test of visualization tool Fix flaky test of visualization tool May 8, 2024
Signed-off-by: Hailong Cui <ihailong@amazon.com>
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Failure
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Error
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 02:02 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 02:57 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 10:19 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 10:19 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 8, 2024 10:19 — with GitHub Actions Inactive

@SneakyThrows
protected Response waitResponseMeetingCondition(String method, String endpoint, String jsonEntity, Predicate<Response> condition) {
for (int i = 0; i < MAX_TASK_RESULT_QUERY_TIME_IN_SECOND; i++) {
Copy link
Collaborator

@ylwu-amzn ylwu-amzn May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worst case this line will run MAX_TASK_RESULT_QUERY_TIME_IN_SECOND times, it will be 300 ? For every retry, it will sleep one second, so it will wait for 300 seconds, that's a long time. Do we really need so long time ? We should fail quickly within a reasonable time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, reduce to total 30s that should be enough for model to be undeployed.

Signed-off-by: Hailong Cui <ihailong@amazon.com>
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 9, 2024 02:30 — with GitHub Actions Error
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 9, 2024 02:30 — with GitHub Actions Failure
@Hailong-am Hailong-am had a problem deploying to ml-commons-cicd-env May 9, 2024 02:30 — with GitHub Actions Error
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 02:31 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 02:31 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 02:31 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 03:25 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 03:25 — with GitHub Actions Inactive
@Hailong-am Hailong-am temporarily deployed to ml-commons-cicd-env May 9, 2024 03:25 — with GitHub Actions Inactive
@rbhavna
Copy link
Collaborator

rbhavna commented May 9, 2024

Thanks @Hailong-am for fixing the flaky tests. @ylwu-amzn This issue is not just for flaky tests. The undeploy response is giving UNDEPLOYED immediately but we still have to rely on get model to see if the model is indeed undeployed. Could be confusing to users. We should either give a task_id or say something like UNDEPLOY INITIATED in the undeploy response if the model is not completely undeployed

@dhrubo-os
Copy link
Collaborator

Thanks @Hailong-am for fixing the flaky tests. @ylwu-amzn This issue is not just for flaky tests. The undeploy response is giving UNDEPLOYED immediately but we still have to rely on get model to see if the model is indeed undeployed. Could be confusing to users. We should either give a task_id or say something like UNDEPLOY INITIATED in the undeploy response if the model is not completely undeployed

Seems like @ylwu-amzn created an issue for this.

@dhrubo-os
Copy link
Collaborator

Do we need to backport this to 2.14 branch? I don't see any backport label to this PR.

@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented May 9, 2024

Do we need to backport this to 2.14 branch? I don't see any backport label to this PR.

I think this test case seems not flaky in the Release CI workflow. We don't see IT failure report yet. To not block the whole release. I think it's fine to not backport to 2.14

@ylwu-amzn ylwu-amzn merged commit aa09014 into opensearch-project:main May 9, 2024
10 of 13 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2416-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 aa09014abdd0a890c9ea3a55bf472ecc7eb7e480
# Push it to GitHub
git push --set-upstream origin backport/backport-2416-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2416-to-2.x.

Hailong-am added a commit to Hailong-am/ml-commons that referenced this pull request May 10, 2024
* add wait for model to be undeployed

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* spotless

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* update model undeploy status

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* reduce total wait time

Signed-off-by: Hailong Cui <ihailong@amazon.com>

---------

Signed-off-by: Hailong Cui <ihailong@amazon.com>
(cherry picked from commit aa09014)
b4sjoo added a commit to b4sjoo/ml-commons that referenced this pull request Jun 6, 2024
Hailong-am added a commit to Hailong-am/ml-commons that referenced this pull request Jun 12, 2024
* add wait for model to be undeployed

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* spotless

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* update model undeploy status

Signed-off-by: Hailong Cui <ihailong@amazon.com>

* reduce total wait time

Signed-off-by: Hailong Cui <ihailong@amazon.com>

---------

Signed-off-by: Hailong Cui <ihailong@amazon.com>
(cherry picked from commit aa09014)
@mingshl mingshl added the flaky-test Flaky build or test issue label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x flaky-test Flaky build or test issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants