Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninstall OpenSearch after testing on dashboards and modify the exception handle to skip irrelevant exception #3823

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

zelinh
Copy link
Member

@zelinh zelinh commented Jul 27, 2023

Description

Uninstall OpenSearch after testing on dashboards and modify the exception handle to skip irrelevant exception.

I noticed our previous fix on the integ tests would be logging multiple empty errors just like this.

2023-07-26 22:49:42 ERROR    ()
2023-07-26 22:49:42 ERROR    ()
2023-07-26 22:49:42 INFO     Sending SIGKILL to PID 6410
2023-07-26 22:49:42 INFO     Process killed with exit code None
2023-07-26 22:49:42 ERROR    ()
2023-07-26 22:49:42 ERROR    ()
2023-07-26 22:49:42 ERROR    ()
2023-07-26 22:49:42 ERROR    ()

This is because when we terminate the process. We are iterating all the running process and see if the file is successfully closed before we remove it.
However, we are catching multiple irrelevant exception and print out them with no more context. Some of this exception come from the race condition(e.g. the process ends before we go through its opened files) or some process we have no access on. We don't need to worry about these. Therefore according to my research, it's no harm to skip them regarding of our testing, and we would only need to log what we need under if statement self.stderr.name == item.path.

one of a use case on using Process.open_files
https://stackoverflow.com/questions/64599502/how-can-i-check-if-a-file-is-actively-open-in-another-process-outside-of-my-pyth

Issues Resolved

closes #3524

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>
@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Merging #3823 (4fd5967) into main (e5e214d) will increase coverage by 0.48%.
Report is 128 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #3823      +/-   ##
==========================================
+ Coverage   91.61%   92.10%   +0.48%     
==========================================
  Files         182      187       +5     
  Lines        5438     5674     +236     
==========================================
+ Hits         4982     5226     +244     
+ Misses        456      448       -8     
Files Changed Coverage Δ
src/system/process.py 95.29% <100.00%> (+2.35%) ⬆️
src/test_workflow/test_cluster.py 85.71% <100.00%> (+0.18%) ⬆️

... and 14 files with indirect coverage changes

@@ -87,6 +87,7 @@ def terminate(self) -> None:
for service in self.dependencies:
termination_result = service.terminate()
self.__save_test_result_data(termination_result)
service.uninstall()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure the uninstallation of dependencies wont affect __save_test_result_data?

Signed-off-by: Zelin Hao <zelinhao@amazon.com>
@zelinh zelinh force-pushed the win-test-more-fix branch from 707fd13 to a885946 Compare July 27, 2023 23:47
@zelinh zelinh marked this pull request as draft July 27, 2023 23:54
@@ -66,9 +66,9 @@ def terminate(self) -> int:
try:
for item in proc.open_files():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this happened because we are checking all the process that is running on the host. The host has a number of processes running that may or may not be related to testing. Can you narrow down what process and open files we are checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the properties of psutil to iterate all the running process though some of them are not related to our testing at all. I don't think there is a way to narrow down to specific process across platforms. Also some of the exception may be caused by race condition and we think we could pass on them since they are already outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the process id from line 48. See if the process is subprocess of it? Rather than iterating over all the existing processes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaiksaya Basically in windows, according to my research, the python.exe was the process that was holding the file from removing which is not the OpenSearch/Dashboards process so that process id is not the one we want to check.

Copy link
Member

@peterzhuamazon peterzhuamazon Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pid thing needs another research and bug fix as it was initially only designed for tar, not any other dist that can be handled by external means.

As of now the pid is not accurate unless is tar.

Thanks.

logging.error(f"{err.args}")
logging.error(f"stdout {item} is being used by process {proc}")
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching and exception and not doing anything is not making sense. Can you elaborate what you are trying to do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exceptions caught are mainly some process that were not related to our testing or they were already closed during the iteration so we may not have permission to go through it. I have seen quite few examples of using this regarding of whether certain file is closed/not used by any process. e.g. https://stackoverflow.com/questions/64599502/how-can-i-check-if-a-file-is-actively-open-in-another-process-outside-of-my-pyth. I think it's safe to do this.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>
@zelinh zelinh force-pushed the win-test-more-fix branch from bc1475b to 85d5664 Compare July 28, 2023 21:39
@zelinh zelinh marked this pull request as ready for review July 28, 2023 21:49
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
@peterzhuamazon peterzhuamazon merged commit 43caf3d into opensearch-project:main Sep 14, 2023
@peterzhuamazon peterzhuamazon deleted the win-test-more-fix branch September 14, 2023 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] OpenSearch-Dashboards integTest does not uninstall OpenSearch after completion
3 participants