-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements in error handling on end-to-end tests #2716
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2716 +/- ##
===========================================
+ Coverage 71.95% 71.96% +0.01%
===========================================
Files 104 104
Lines 15765 15765
Branches 2244 2244
===========================================
+ Hits 11343 11345 +2
- Misses 3906 3909 +3
+ Partials 516 511 -5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
# The same orchestrator machine may be executing multiple suites on the same test VM, or | ||
# the same suite on one or more test VMs; we use this file to mark the build is already done | ||
build_done_path = self._working_directory/"build.done" | ||
if build_done_path.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we remove this case because we're no longer executing multiple suites on the same VM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, I explained the motivation in the PR comment just a couple of lines above.
@@ -14,9 +14,9 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
# | |||
|
|||
from collections.abc import Callable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change in this module is that I stopped using before_case and after_case for setup and cleanup.
The issue is that errors in before_case are not reported as failures, they simply make LISA skip test execution. Tests can be skipped for valid reasons (e.g. an unsupported OS), but skipping tests on setup failures without reporting an error is not correct, since those errors may go unnoticed on daily runs.
I replaced the use of before_case and after_case with custom code (mainly in the execute() method).
) | ||
# E0401: Unable to import 'lisa.sut_orchestrator.azure.common' (import-error) | ||
from lisa.sut_orchestrator.azure.common import get_node_context # pylint: disable=E0401 | ||
|
||
from azurelinuxagent.common.version import AGENT_VERSION | ||
|
||
|
||
class AgentTestSuite(TestSuite): | ||
class AgentTestScenario(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class used to be an specialization of LISA's TestSuite in order to use the before_case/after_case protocol, but since I am not using anymore now there is no need for that class relationship. Now this is a plain AgentTestScenario class (it is very similar in concept to a DCR scenario) .
I renamed the class, but will rename the py file until my next PR, otherwise reviewing the actual code changes would be more difficult.
|
||
self._log.info(f"Test Node: {self._vm_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is now in the _setup() method
self._clean_up() | ||
|
||
def _initialize(self, node: Node, log: Logger) -> None: | ||
self._node = node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now done in init()
@@ -112,18 +108,10 @@ def _build_agent_package(self) -> Path: | |||
""" | |||
Builds the agent package and returns the path to the package. | |||
""" | |||
build_path = self._working_directory/"build" | |||
|
|||
# The same orchestrator machine may be executing multiple suites on the same test VM, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the "*.done" files (they actually never worked, since I am removing the working directory on cleanup).
In the current DCR automation we use a new test VM to run 1 single scenario. I want to be able to run multiple scenarios on the same test VM in the new automation pipeline and the intention of those "*.done" files was to avoid duplicated setup. However, not only the implementation was incorrect, but that approach was cumbersome during test development. I will come up with a better approach when I start combining scenarios.
|
||
def _execute_script_on_node(self, script_path: Path, parameters: str = "", sudo: bool = False) -> int: | ||
def execute_script_on_node(self, script_path: Path, parameters: str = "", sudo: bool = False) -> int: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it public, added a docstring and did minor formatting improvements in its logging
@@ -15,5 +15,11 @@ tar --exclude='journal/*' --exclude='omsbundle' --exclude='omsagent' --exclude=' | |||
/var/lib/waagent/ \ | |||
/etc/waagent.conf | |||
|
|||
# tar exits with 1 on warnings; ignore those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are hitting a warning about the logs changing while we are creating the tarball. It's just a warning, but tar exits with code 1 (fatal errors are 2, see the man page for details)
@@ -59,7 +74,7 @@ echo "Restarting service..." | |||
service $service_name stop | |||
|
|||
# Rename the previous log to ensure the new log starts with the agent we just installed | |||
mv /var/log/waagent.log /var/log/waagent.pre-install.log | |||
mv /var/log/waagent.log /var/log/waagent."$(date --iso-8601=seconds)".log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a timestamp to the previous log. Useful if the scenario is run multiple times on the same VM (e.g. during test development) or if multiple scenarios run on the same VM (in the future).
@@ -0,0 +1,31 @@ | |||
# Microsoft Azure Linux Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a "sample" LISA test suite that runs on the local machine. Useful to do quick tests/experiments with LISA
# The same orchestrator machine may be executing multiple suites on the same test VM, or | ||
# the same suite on one or more test VMs; we use this file to mark the build is already done | ||
build_done_path = self._working_directory/"build.done" | ||
if build_done_path.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, I explained the motivation in the PR comment just a couple of lines above.
rmtree(self._context.working_directory.as_posix()) | ||
except Exception as exception: | ||
self._log.warning(f"Failed to remove the working directory: {exception}") | ||
self._context.working_directory.mkdir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, continue the setup if we failed to remove? or mkdir raise the exception if already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue with warning. this is mainly for the dev scenario. the automation run works on fresh vms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading mkkdir() raise exception if directory exists. So, in dev scenario it won't work if failed to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is we could raise the exception if we failed to remove because anyway in next step mkdir throws exception if folder exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will fail and there will be a warning with the info about the delete failure
the dev will get an error, see the warning in the log and fix the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is we could raise the exception if we failed to remove because anyway in next step mkdir throws exception if folder exist.
Same difference, with the possibility that the working directory goes away in-between the 2 calls and there is no error
Implemented a few improvements in error handling; I'm pointing them out with comments within the PR