Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix check for systemd-run failure when invoking extensions #1943

Merged
merged 9 commits into from
Jul 16, 2020

Conversation

pgombar
Copy link
Contributor

@pgombar pgombar commented Jul 13, 2020

Description

We were erroneously classifying systemd-run failures when invoking extensions which produce a long stdout and/or stderr. The reason is because we check for the unit name in the beginning of stderr as a signal that systemd-run was invoked successfully. The change in this PR is to check for the unit name in stderr before truncating it for telemetry purposes.

Other fixes:

  1. We had a bug in extensionprocessutils.py where the TELEMETRY_MESSAGE_MAX_LEN wasn't being patched since it was being assigned in the function signature instead of inside of the function, which produced misleading results in the unit tests.
  2. Fixes a bug in test_start_extension_command_should_invoke_the_command_directly_if_systemd_times_out where because of the indentation the asserts weren't being executed since they were part of the context manager instead of being outside of its scope.

PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines

@pgombar pgombar changed the title Fix stderr Fix check for systemd-run failure when invoking extensions Jul 13, 2020
@@ -99,7 +99,7 @@ def read_output(stdout, stderr):
return format_stdout_stderr("", "Cannot read stdout/stderr: {0}".format(ustr(e)))


def format_stdout_stderr(stdout, stderr, max_len=TELEMETRY_MESSAGE_MAX_LEN):
def format_stdout_stderr(stdout, stderr):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lesson to learn: don't assign constants in the function signature since this is loaded before the function body, so patching the constant will not work.

@codecov
Copy link

codecov bot commented Jul 13, 2020

Codecov Report

Merging #1943 into develop will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1943   +/-   ##
========================================
  Coverage    69.62%   69.62%           
========================================
  Files           85       85           
  Lines        11926    11929    +3     
  Branches      1667     1667           
========================================
+ Hits          8303     8306    +3     
  Misses        3254     3254           
  Partials       369      369           
Impacted Files Coverage Δ
azurelinuxagent/common/cgroupapi.py 79.65% <100.00%> (+0.10%) ⬆️
...urelinuxagent/common/utils/extensionprocessutil.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0edea53...182bf6d. Read the comment docs.

@@ -164,29 +164,26 @@ def test_handle_process_completion_should_raise_on_nonzero_exit_code(self):
self.assertEquals(context_manager.exception.code, error_code)
self.assertIn("Non-zero exit code:", ustr(context_manager.exception))

def test_read_output_it_should_return_no_content(self):
def test_read_output_should_return_no_content(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I fixed the patching for TELEMETRY_MESSAGE_MAX_LEN, these tests produced different (correct) results :)

@@ -27,6 +27,8 @@
from azurelinuxagent.common.future import ustr
from azurelinuxagent.common.utils import shellutil, fileutil
from nose.plugins.attrib import attr

from azurelinuxagent.common.utils.extensionprocessutil import TELEMETRY_MESSAGE_MAX_LEN
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I think I clicked on "format file" so there's a lot of whitespace updates, I'll point out relevant updates to the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we need to review only the lines where you added comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this file, that's the case, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your difftool should allow you selectively undo the changes you don't want (meld does). In the future let's try to avoid global formatting changes, since it makes the review and reading the history of the code harder,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can even undo the changes after commiting, if you feel like doing it here)

Comment on lines 755 to 762
extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if
"echo 'success'" in args[0]]

self.assertEquals(2, len(extension_calls), "The extension should have been invoked exactly twice")
self.assertIn("systemd-run --unit=Microsoft.Compute.TestExtension_1.2.3", extension_calls[0], "The first call to the extension should have used systemd")
self.assertEquals("echo 'success'", extension_calls[1], "The second call to the extension should not have used systemd")
self.assertIn("systemd-run --unit=Microsoft.Compute.TestExtension_1.2.3", extension_calls[0],
"The first call to the extension should have used systemd")
self.assertEquals("echo 'success'", extension_calls[1],
"The second call to the extension should not have used systemd")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent several hours trying to figure out why the breakpoint in L755 was never hit... turns out we had a bug checked in where the indentation was such that the asserts were inside of the context manager, instead of outside it.

This is a nasty bug since it's so easy to miss. Could linters figure this one out? Thoughts, @kevinclark19a?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Interesting. I'm not sure a linter could actually help with this... I don't see how it would differentiate between the legitimate caught raise statement presiding (presumably) inside of the start_extension_command call, and the erroneously caught one inside the assertEquals call. As far as the AST is concerned, they're basically identical.

Still, there might be some way of using attributes to let the linter know what we want caught and what we want to error out... I'll have to look into it more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point out the change in indentation? I'm having a hard time finding it in the diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I was blind.. turns out I commented on the wrong method. The content of the comment is correct, but the method in question is test_start_extension_command_should_not_use_fallback_option_if_extension_fails instead of this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment on the right method.

@patch("azurelinuxagent.common.cgroupapi.add_event")
@patch('time.sleep', side_effect=lambda _: mock_sleep())
@patch("azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN", 5)
def test_start_extension_command_should_not_use_fallback_option_if_extension_fails_with_long_output(self, *args):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test to cover long extension outputs. This would fail before the changes in this PR.

larohra
larohra previously approved these changes Jul 14, 2020
Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small typo in a comment, but everything else LGTM.

tests/common/test_cgroupapi.py Outdated Show resolved Hide resolved
stdout=stdout,
stderr=stderr)

extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if command in args[0]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the indentation was wrong. It was at the same level with the context manager, instead of outside, like now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this new code... you mentioned it was in existing code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see it now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assert for root was hiding the issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it should've failed in travis then right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't fail because these asserts were never executed. I ran the test both from sudo shell and from sudo PyCharm and could never hit the asserts. I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh dang. Good catch though! Hopefully we can catch these moving on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?

The test case raises, so the call to start_extension_command will exit the with statement.

I seem to remember at some point I formatted a few tests in this file so that may have moved the assert to the with statement.

narrieta
narrieta previously approved these changes Jul 15, 2020
Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider undoing the formatting changes

larohra
larohra previously approved these changes Jul 15, 2020
Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pgombar pgombar dismissed stale reviews from larohra and narrieta via 5a68bdc July 15, 2020 18:29
stdout=stdout,
stderr=stderr)

extension_calls = [args[0] for (args, _) in popen_patch.call_args_list if command in args[0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it has something to do with the context manager which just exits the current scope when the object is released?

The test case raises, so the call to start_extension_command will exit the with statement.

I seem to remember at some point I formatted a few tests in this file so that may have moved the assert to the with statement.

@pgombar pgombar merged commit 7e3eac1 into Azure:develop Jul 16, 2020
@pgombar pgombar deleted the fix_stderr branch July 16, 2020 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants