-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix check for systemd-run failure when invoking extensions #1943
Conversation
@@ -99,7 +99,7 @@ def read_output(stdout, stderr): | |||
return format_stdout_stderr("", "Cannot read stdout/stderr: {0}".format(ustr(e))) | |||
|
|||
|
|||
def format_stdout_stderr(stdout, stderr, max_len=TELEMETRY_MESSAGE_MAX_LEN): | |||
def format_stdout_stderr(stdout, stderr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lesson to learn: don't assign constants in the function signature since this is loaded before the function body, so patching the constant will not work.
Codecov Report
@@ Coverage Diff @@
## develop #1943 +/- ##
========================================
Coverage 69.62% 69.62%
========================================
Files 85 85
Lines 11926 11929 +3
Branches 1667 1667
========================================
+ Hits 8303 8306 +3
Misses 3254 3254
Partials 369 369
Continue to review full report at Codecov.
|
@@ -164,29 +164,26 @@ def test_handle_process_completion_should_raise_on_nonzero_exit_code(self): | |||
self.assertEquals(context_manager.exception.code, error_code) | |||
self.assertIn("Non-zero exit code:", ustr(context_manager.exception)) | |||
|
|||
def test_read_output_it_should_return_no_content(self): | |||
def test_read_output_should_return_no_content(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I fixed the patching for TELEMETRY_MESSAGE_MAX_LEN
, these tests produced different (correct) results :)
tests/common/test_cgroupapi.py
Outdated
@@ -27,6 +27,8 @@ | |||
from azurelinuxagent.common.future import ustr | |||
from azurelinuxagent.common.utils import shellutil, fileutil | |||
from nose.plugins.attrib import attr | |||
|
|||
from azurelinuxagent.common.utils.extensionprocessutil import TELEMETRY_MESSAGE_MAX_LEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I think I clicked on "format file" so there's a lot of whitespace updates, I'll point out relevant updates to the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we need to review only the lines where you added comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this file, that's the case, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your difftool should allow you selectively undo the changes you don't want (meld does). In the future let's try to avoid global formatting changes, since it makes the review and reading the history of the code harder,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you can even undo the changes after commiting, if you feel like doing it here)
tests/common/test_cgroupapi.py
Outdated
extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if | ||
"echo 'success'" in args[0]] | ||
|
||
self.assertEquals(2, len(extension_calls), "The extension should have been invoked exactly twice") | ||
self.assertIn("systemd-run --unit=Microsoft.Compute.TestExtension_1.2.3", extension_calls[0], "The first call to the extension should have used systemd") | ||
self.assertEquals("echo 'success'", extension_calls[1], "The second call to the extension should not have used systemd") | ||
self.assertIn("systemd-run --unit=Microsoft.Compute.TestExtension_1.2.3", extension_calls[0], | ||
"The first call to the extension should have used systemd") | ||
self.assertEquals("echo 'success'", extension_calls[1], | ||
"The second call to the extension should not have used systemd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent several hours trying to figure out why the breakpoint in L755 was never hit... turns out we had a bug checked in where the indentation was such that the asserts were inside of the context manager, instead of outside it.
This is a nasty bug since it's so easy to miss. Could linters figure this one out? Thoughts, @kevinclark19a?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Interesting. I'm not sure a linter could actually help with this... I don't see how it would differentiate between the legitimate caught raise statement presiding (presumably) inside of the start_extension_command
call, and the erroneously caught one inside the assertEquals
call. As far as the AST is concerned, they're basically identical.
Still, there might be some way of using attributes to let the linter know what we want caught and what we want to error out... I'll have to look into it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point out the change in indentation? I'm having a hard time finding it in the diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I was blind.. turns out I commented on the wrong method. The content of the comment is correct, but the method in question is test_start_extension_command_should_not_use_fallback_option_if_extension_fails
instead of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment on the right method.
@patch("azurelinuxagent.common.cgroupapi.add_event") | ||
@patch('time.sleep', side_effect=lambda _: mock_sleep()) | ||
@patch("azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN", 5) | ||
def test_start_extension_command_should_not_use_fallback_option_if_extension_fails_with_long_output(self, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test to cover long extension outputs. This would fail before the changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 small typo in a comment, but everything else LGTM.
stdout=stdout, | ||
stderr=stderr) | ||
|
||
extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if command in args[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the indentation was wrong. It was at the same level with the context manager, instead of outside, like now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this new code... you mentioned it was in existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i see it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assert for root was hiding the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it should've failed in travis then right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't fail because these asserts were never executed. I ran the test both from sudo shell and from sudo PyCharm and could never hit the asserts. I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh dang. Good catch though! Hopefully we can catch these moving on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?
The test case raises, so the call to start_extension_command will exit the with statement.
I seem to remember at some point I formatted a few tests in this file so that may have moved the assert to the with statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider undoing the formatting changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
stdout=stdout, | ||
stderr=stderr) | ||
|
||
extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if command in args[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?
The test case raises, so the call to start_extension_command will exit the with statement.
I seem to remember at some point I formatted a few tests in this file so that may have moved the assert to the with statement.
Description
We were erroneously classifying systemd-run failures when invoking extensions which produce a long stdout and/or stderr. The reason is because we check for the unit name in the beginning of stderr as a signal that systemd-run was invoked successfully. The change in this PR is to check for the unit name in stderr before truncating it for telemetry purposes.
Other fixes:
extensionprocessutils.py
where theTELEMETRY_MESSAGE_MAX_LEN
wasn't being patched since it was being assigned in the function signature instead of inside of the function, which produced misleading results in the unit tests.test_start_extension_command_should_invoke_the_command_directly_if_systemd_times_out
where because of the indentation the asserts weren't being executed since they were part of the context manager instead of being outside of its scope.PR information
Quality of Code and Contribution Guidelines