Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go #3124

Conversation

owenowenisme
Copy link
Contributor

@owenowenisme owenowenisme commented Feb 27, 2025

Why are these changes needed?

There are 2 reason make the test fail.

  • There are some kubectl process occupied the 8265 port that is required by the kubectl job submit. But for the independence of tests, we should not expect the test cause this problem to kill the process properly every time, so added the pkill kubectl before running kubectl job submit test.
  • The ray is not installed on build kite env, so modified the setup-env.sh to install.

Related issue number

Closes #2801

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme owenowenisme force-pushed the re-enable-kubectl-plugin-ray-job-submit-test branch 3 times, most recently from bcef3f7 to a2256b1 Compare February 28, 2025 06:35
Signed-off-by: You-Cheng <mses010108@gmail.com>
Signed-off-by: You-Cheng <mses010108@gmail.com>
Signed-off-by: You-Cheng <mses010108@gmail.com>
Signed-off-by: You-Cheng <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the re-enable-kubectl-plugin-ray-job-submit-test branch from 372bf59 to 5b9ae41 Compare February 28, 2025 06:53
Signed-off-by: You-Cheng <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the re-enable-kubectl-plugin-ray-job-submit-test branch from 707f82b to 4108c51 Compare February 28, 2025 09:04
Signed-off-by: You-Cheng <mses010108@gmail.com>
Signed-off-by: You-Cheng <mses010108@gmail.com>
@owenowenisme owenowenisme marked this pull request as ready for review February 28, 2025 09:42
@owenowenisme
Copy link
Contributor Author

@MortalHappiness
PTAL🙏

apt-get install -y python3.11 python3-pip

# Install requirements
pip install --break-system-packages ray==2.41.0 ray\[default\]==2.41.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install --break-system-packages ray==2.41.0 ray\[default\]==2.41.0
pip install --break-system-packages ray[default]==2.41.0

Comment on lines +34 to +35
killKubectlCmd := exec.Command("pkill", "-9", "kubectl")
_ = killKubectlCmd.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes needed?

There are 2 reason make the test fail.

  • There are some kubectl process occupied the 8265 port that is required by the kubectl job submit. But for the independence of tests, we should not expect the test cause this problem to kill the process properly every time, so added the pkill kubectl before running kubectl job submit test.
  • The ray is not installed on build kite env, so modified the setup-env.sh to install.

Copy link
Member

@MortalHappiness MortalHappiness Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also example which tests occupy this port. This can be done in follow-up PRs. Once we find the test, we need to kill the kubectl process at the end of the test.

@MortalHappiness
Copy link
Member

Could you also create another "[DO NOT MERGE]" draft PR and run it more than 5 times and make sure all tests pass? Just like what I did in #3116 (review)

Signed-off-by: You-Cheng <mses010108@gmail.com>
@owenowenisme
Copy link
Contributor Author

owenowenisme commented Feb 28, 2025

https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds?branch=owenowenisme%3Apr%2F3124%2Fci-test
Ran 5 tests. Please ignore the oldest one, it is not related to this.

Signed-off-by: You-Cheng <mses010108@gmail.com>
@kevin85421 kevin85421 merged commit 79c7c87 into ray-project:master Feb 28, 2025
21 checks passed
@owenowenisme owenowenisme deleted the re-enable-kubectl-plugin-ray-job-submit-test branch March 1, 2025 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go
3 participants