Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subp cleanup and typing #4555

Merged
merged 8 commits into from
Dec 5, 2023
Merged

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Oct 25, 2023

Improve code quality and reduce maintenance burden by removing boilerplate, old unused copy-pasted code, and redundant features. Update code to use current Python3 features. Add typing information to subp args. This PR reduces code complexity, improves static analysis via typing, and eliminates the need to test internal features that we didn't even use.

@TheRealFalcon
Copy link
Member

I know you said not ready for review yet, but broadly I'm -1 on many of the changes.

In particular, I don't think that subp args or ProcessExecutionError should be restricted to bytes. I see no reason why we should need to translate to bytes when calling it. That just puts more complexity at the call sites when the complexity can be encapsulated in the subp call or the exception class, and it leaks bytes too much into the inside of the application. If I were to go either way, I'd go for only accepting strings rather than only accepting bytes, but I think it's nice that we can take either and generally do the right thing.

Do you see broken/unhelpful typing because of the current state? What was the impetus for this change?

Also, looks like I let in a small typing "regression" during one of my rebases of the snap PR. Adding that back in will probably help dealing with subp responses.

@holmanb
Copy link
Member Author

holmanb commented Oct 25, 2023

Thanks for the early feedback on this one @TheRealFalcon. Much of this was early braindump solution to being dissatisfied with type handling in the subp module, since it makes the logic unnecessarily difficult to follow when debugging issues related to this.

I'll keep your feedback in mind, but respond when I'm actually ready for review.

@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from 5d07db0 to 248f5c9 Compare October 25, 2023 15:16
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch 2 times, most recently from 7254a02 to 9de4ab2 Compare November 3, 2023 21:00
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch 2 times, most recently from e15c9db to f18f5eb Compare November 15, 2023 17:11
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…nical#4555)

The only consumer of this kwarg was cc_snap, which was printing output
directly to stderr rather than using intended logging code, which it shouldn't
be doing.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…ical#4555)

This parameter has a single callsite. Configuration modules shouldn't
send output directly to stdout / stderr.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
Added in Python 3.3, we can safely use this Python feature.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from f18f5eb to 417bb33 Compare November 15, 2023 17:15
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…al#4555)

- Remove redundant code from the few callsites that used this parameter.
- Eliminate tests for behavior that was never used.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…ical#4555)

This parameter has a single callsite. Configuration modules shouldn't
send output directly to stdout / stderr.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
Added in Python 3.3, we can safely use this Python feature.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…al#4555)

- Remove redundant code from the few callsites that used this parameter.
- Eliminate tests for behavior that was never used.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from 417bb33 to b418e3d Compare November 15, 2023 17:33
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…al#4555)

- Remove redundant code from the few callsites that used this parameter.
- Eliminate tests for behavior that was never used.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from b418e3d to 95cb638 Compare November 15, 2023 19:11
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…nical#4555)

The only consumer of this kwarg was cc_snap, which was printing output
directly to stderr rather than using intended logging code, which it shouldn't
be doing.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
…ical#4555)

This parameter has a single callsite. Configuration modules shouldn't
send output directly to stdout / stderr.
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from 95cb638 to 70632f0 Compare November 15, 2023 19:16
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
Added in Python 3.3, we can safely use this Python feature.
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 15, 2023
@TheRealFalcon
Copy link
Member

Some really nice changes here, thanks!

First pass - most commits look good, but I'm uncomfortable with Add type information to args commit. It's not entirely correct:

(cloud-init37) cow:cloud-init (pr_subp) $ pyright 2>&1 | grep -e "args.*subp"
  /home/james/c/cloud-init/cloudinit/util.py:1901:29 - error: Argument of type "Literal['mount']" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_ca_certs.py:177:23 - error: Argument of type "tuple[Literal['debconf-set-selections'], Literal['-']]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_disk_setup.py:441:39 - error: Argument of type "list[str | Unknown | None]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_disk_setup.py:442:37 - error: Argument of type "list[str | Unknown | None]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_disk_setup.py:744:20 - error: Argument of type "list[str | Unknown | None]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_disk_setup.py:749:21 - error: Argument of type "list[str | Unknown | None]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_disk_setup.py:760:22 - error: Argument of type "list[str | Unknown | None]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_rsyslog.py:164:22 - error: Argument of type "str" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_wireguard.py:251:25 - error: Argument of type "Literal['lsmod']" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/config/cc_wireguard.py:254:23 - error: Argument of type "Literal['modprobe wireguard']" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/cloudinit/sources/helpers/azure.py:589:17 - error: Argument of type "str" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:41:14 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:83:14 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:84:14 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:95:25 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:98:17 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:134:25 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"
  /home/james/c/cloud-init/tests/integration_tests/datasources/test_lxd_hotplug.py:137:17 - error: Argument of type "list[LiteralString]" cannot be assigned to parameter "args" of type "List[str]" in function "subp"

but even if we do satisfy pyright, there's enough subp calls where the args can be combined from multiple different sources that I don't think I'd be comfortable with the changes until we have test coverage of all the call points. If we want to annotate it, I think we should use Union[str, bytes, List[str], List[bytes]]

@holmanb
Copy link
Member Author

holmanb commented Nov 29, 2023

but even if we do satisfy pyright, there's enough subp calls where the args can be combined from multiple different sources that I don't think I'd be comfortable with the changes until we have test coverage of all the call points.

Good point.

I'd like to see pyright / mypy benefit from knowing the expected types of subp()'s args, so I'll add your suggested annotation and replace the handling code and tests, with the goal that at some point hopefully we can get to a stricter set of inputs.

Future work:

Per the document you linked above, if we can get away from requiring support for byte inputs to subp someday (outside of the data argument, which itself may be the output of a subp() command), I would be happier with subp's scope and signature.

holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 29, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 29, 2023
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from 541e449 to 9d5fe5d Compare November 29, 2023 18:14
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some inline comments (mostly minor), but otherwise everything here looks really good!

@@ -174,7 +173,7 @@ def run_commands(commands):
for command in fixed_snap_commands:
shell = isinstance(command, str)
try:
subp.subp(command, shell=shell, status_cb=sys.stderr.write)
subp.subp([command], shell=shell)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look for history on this status_cb? There might've been a reason we're sending those logs to stderr. I'm guessing it's fine, but lets do our due diligence if we haven't yet.

Also, making this a list is in the wrong commit (remove env parameter), and I'd rather not change it if we're still supporting strings for now.

Copy link
Member Author

@holmanb holmanb Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look for history on this status_cb? There might've been a reason we're sending those logs to stderr. I'm guessing it's fine, but lets do our due diligence if we haven't yet.

I did, yes. This parameter came with the original addition of the cc_snap module, and no justification for this parameter was given in the commit message, nor clues left in the code as to why this was deemed necessary.

I don't see how this additional text or the output location would significantly change a user's debugging experience. With and without status_cb, successful calls end up in the logs (albeit without the extra Begin/End log line). With and without status_cb, failed commands end up with trackbacks in the logs.

Also, making this a list is in the wrong commit (remove env parameter), and I'd rather not change it if we're still supporting strings for now.

Oops, fixed.

devnull_fp.close()

# Just ensure blank instead of none.
if capture or combine_capture:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still want this in the capture case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessary (it never was).

The docs:

Popen.communicate(input=None, timeout=None)

... If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.

communicate() returns a tuple (stdout_data, stderr_data). The data will be strings if streams were opened in text mode; otherwise, bytes.

Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other than None in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.

Popen.stdout

If the stdout argument was PIPE, this attribute is a readable stream object as returned by open(). Reading from the stream provides output from the child process. If the encoding or errors arguments were specified or the text or universal_newlines argument was True, the stream is a text stream, otherwise it is a byte stream. If the stdout argument was not PIPE, this attribute is None.

Popen.stderr

If the stderr argument was PIPE, this attribute is a readable stream object as returned by open(). Reading from the stream provides error output from the child process. If the encoding or errors arguments were specified or the text or universal_newlines argument was True, the stream is a text stream, otherwise it is a byte stream. If the stderr argument was not PIPE, this attribute is None.

The relevant bits:

    if capture:
        stdout = subprocess.PIPE
        stderr = subprocess.PIPE
...
    try:
        sp = subprocess.Popen(
            bytes_args,
            stdout=stdout,
            stderr=stderr,
            stdin=stdin,
            env=env,
            shell=shell,
            cwd=cwd,
        )
        (out, err) = sp.communicate(data)

The encoding or errors arguments were not specified and text and universal_newlines argument are not True, therefore stdout and stderr are byte streams, and the data returned in the tuple will be bytes.

The None-check code that exists now, I believe, was due to combined_capture using stderr = subprocess.STDOUT, and therefore returning None in the first element of the tuple. That plus maybe the assumption that the same was possible for the first argument in the tuple for some reason and when using capture. tl;dr Not sure why the current code is this way, but it doesn't look right.

Copy link
Member Author

@holmanb holmanb Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealFalcon I suppose it would probably be better to include a note in the commit message summarizing this (or making it its own separate commit) so that this doesn't look like a mistake in the git log. I'll do that on merge, if you are fine with that.

cmd2 = "bogus command"
cmd3 = 'echo "MOM" >> %s' % outfile
commands = [cmd1, cmd2, cmd3]
@pytest.mark.allow_all_subp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice conversion to pytest, but our docs still say that all unit tests should live in a class. I'd be fine removing that (and making an exception here if we remove it right away), but I've tried to keep to that rule until now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with being flexible with that rule if no class grouping makes sense (otherwise it is just boilerplate), but for now I'll just toss them in a class.

holmanb added a commit to holmanb/cloud-init that referenced this pull request Dec 1, 2023
holmanb added a commit to holmanb/cloud-init that referenced this pull request Dec 1, 2023
@holmanb
Copy link
Member Author

holmanb commented Dec 1, 2023

@TheRealFalcon Thanks for the review! I think I've addressed everything.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and nice cleanup!

I suppose it would probably be better to include a note in the commit message summarizing this

Works for me.

Can we wait till after 23.4 to merge this one?

@holmanb
Copy link
Member Author

holmanb commented Dec 1, 2023

LGTM and nice cleanup!

Thanks!

Can we wait till after 23.4 to merge this one?

Agreed, no rush on this one.

The only consumer of this kwarg was cc_snap, which was printing output
directly to stderr rather than using intended logging code, which it shouldn't
be doing.
This parameter has a single callsite. Configuration modules shouldn't
send output directly to stdout / stderr.

Note: Regarding the code below the comment "Just ensure blank instead of none"

'None' output was only possible for the stderr tuple member output of
.communicate() when stderr was directed to STDOUT. This only occurred in the
case of the 'combine_capture' argument, and only for stderr, not stdout. This
removes the unnecessary assignments for the 'capture' argument, as well
as the no-longer-necessary assignments for 'capture_parameter'.

See the docs[1] for more details.

[1] https://docs.python.org/3/library/subprocess.html
Added in Python 3.3, we can safely use this Python feature.
…l#4555)

This parameter always receives the same argument in runtime code and adds no
value. Remove it.
- Remove redundant code from the few callsites that used this parameter.
- Eliminate tests for behavior that was never used.
@holmanb holmanb force-pushed the holmanb/subp-cleanup branch from a716209 to 85dc3b7 Compare December 5, 2023 00:30
@igalic
Copy link
Collaborator

igalic commented Dec 5, 2023

while you're refactoring this hot mess, is there any way we can add the exit status to the SubpResult?

it is in the ProcessExecutionError but it's not very easily accessible

@holmanb holmanb merged commit 3bad8b5 into canonical:main Dec 5, 2023
26 checks passed
holmanb added a commit that referenced this pull request Dec 5, 2023
The only consumer of this kwarg was cc_snap, which was printing output
directly to stderr rather than using intended logging code, which it shouldn't
be doing.
holmanb added a commit that referenced this pull request Dec 5, 2023
This parameter has a single callsite. Configuration modules shouldn't
send output directly to stdout / stderr.

Note: Regarding the code below the comment "Just ensure blank instead of none"

'None' output was only possible for the stderr tuple member output of
.communicate() when stderr was directed to STDOUT. This only occurred in the
case of the 'combine_capture' argument, and only for stderr, not stdout. This
removes the unnecessary assignments for the 'capture' argument, as well
as the no-longer-necessary assignments for 'capture_parameter'.

See the docs[1] for more details.

[1] https://docs.python.org/3/library/subprocess.html
holmanb added a commit that referenced this pull request Dec 5, 2023
Added in Python 3.3, we can safely use this Python feature.
holmanb added a commit that referenced this pull request Dec 5, 2023
Remove code for unused argument types.
holmanb added a commit that referenced this pull request Dec 5, 2023
This parameter always receives the same argument in runtime code and adds no
value. Remove it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants