Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(coverage): Register coverage.py to hermetic toolchains #977

Merged
merged 27 commits into from
Jan 31, 2023

Conversation

aignas
Copy link
Collaborator

@aignas aignas commented Jan 9, 2023

PR Checklist

Please check if your PR fulfills the following requirements:

  • Some examples have been modified to work with bazel coverage.
  • Docs have been added / updated (for bug fixes / features).

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

The coverage_tool cannot be included via the toolchian.

Issue Number: #43

What is the new behavior?

Summary:

  • A script to update URLs and sha256 values used to download platform specific coverage.py wheels.
  • Allow to specify coverage_tool attribute in python_repository.
  • Register coverage.py when running pip_install_dependencies function.
  • Ensure that use_repo includes coverage tool, so that bzlmod users can use it too.
  • Have a boolean flag, which disables setting coverage_tool in the toolchain code.

Does not work/bad/ugly:

  • Use coverage.py v6.5.0 because the latest has a file types.py in the package directory, which imports from Python's stdlib types. Somehow the Python interpreter is thinking that the from types import FrameType is referring to the currently interpreted file and everything breaks. I would have expected the package to use absolute imports and only attempt to import from coverage.types if we use coverage.types and not just a plain types import.
  • The multi_python_versions example cannot show coverage for the more complex tests that are using subprocess. I am wondering if this is related to the fact that we are including coverage.py via the toolchain and not through other mechanisms.
  • The __init__.py files in the root of the WORKSPACE in bzlmod is breaking, when running under bazel coverage //:test. However, it started working when I renamed __init__.py to lib.py. I am suspecting that this has to do with the fact that the layer of indirection that coverage introduces could be something to do with that.
  • The coverage collection script is written in bash so I removed support for the automatic registration of coverage toolchain for Windows for now. The issue tracking upstream support for coverage in Windows is here.

Work towards #43.

Does this PR introduce a breaking change?

  • Yes
  • No

We may potentially break setups where coverage tooling is already included via
COVERAGE_PYTHON variable setting. In that case the users should use the flag to
disable coverage_tool inclusion in the toolchain. Setting
register_coverage_tool = False when registering toolchains would be sufficient.

Other information

@aignas aignas marked this pull request as draft January 9, 2023 10:28
@aignas
Copy link
Collaborator Author

aignas commented Jan 9, 2023

This is mostly building on the support added in bazelbuild/bazel@9d01630

@aignas aignas marked this pull request as ready for review January 9, 2023 12:06
@aignas aignas marked this pull request as draft January 17, 2023 07:20
@aignas aignas force-pushed the coverage branch 7 times, most recently from a0a0700 to 8550b00 Compare January 18, 2023 00:23
@aignas aignas marked this pull request as ready for review January 18, 2023 01:15
http_archive,
name = name,
build_file_content = """
py_library(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this can be a py_library. Doesn't that lead to a dependency when you add it as a dependency for the python_runtime? Or is the python toolchain not actually a dependency of py_library targets (as opposed to py_binary)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage_tool needs to be a single file from the toolchain point of view and that is why there is only a single src in there. I was following official docs from https://bazel.build/configure/coverage#python and since the manual tests with bazel coverage worked in some cases, I thought that was fine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so! I must have been thinking about the prohibition on using py_binary there (because py_binary does depend on the python toolchain to assemble the wrapper script, and that of course depends on the coverage tool).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, thanks for explaining. Should we have an extra line of docs somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not using py_library for now -- it implies a set of functionality that simply isn't respected (e.g. imports attribute won't be respected, anything in its PyInfo is ignored, etc).

It's currently possible to put a py_library there, but is a bit risky. This is because py_library doesn't, today, depend on the toolchain. However, I don't see how it can avoid that forever (in particular, I'm thinking of for when pyc precompilation is added).

I do think a py_library-like-equivalent dependency is the correct conceptual type of dependency that should be there, though. Running with code coverage is, essentially, adding an extra dependency to the binary and wrapping the regular entry point in another entry point. Unfortunately, it's not immediately clear to me how to do make that work cleanly via (or with support from) the toolchain (I have a few ideas, but haven't thought them through).

To use the coverage settings of the toolchain, you need to:

  1. Point coverage_tool to a single file. Only this file is consumed. Any additional files, runfiles, dependencies, etc aren't used.
  2. Put any files coverage_tool needs in coverage_files, which will be added to the binary's runfiles.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean filegroup is clearly documented that srcs is in the files output for the rule (which we need) and data is in the runfiles. I don't think we need to make it more complicated than that (it is also true that bazel will treat such a filegroup as an executable target, however we don't think we actually depend on that behavior).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, sorry, I mixed up the py_runtime and PyRuntimeInfo apis. Disregard the "you have to specify two attributes" part. The splitting of the target into (entry point, files) is handled by py_runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, +1 to filegroup.

I don't think we need to make it more complicated than that

I think you're referring to my "it should be a py library-like-equivalent dependency" comment?

There's a big distinction between "bundle of pre-built, platform-specific files" and a "dependency" (for lack of a better term). The former carries no information other than "here's a bag of files". A dependency can carry additional information, such as modifications to make to the import path, additional linker flags, propagating a proper runfiles object (we currently omit symlinks, for example), or other necessary metadata.

But, lets not derail this PR with an extended discussion of that? Happy to discuss in more detail on slack or in an issue

it is also true that bazel will treat such a filegroup as an executable target, however we don't think we actually depend on that behavior

"depend" is a strong word, but we do allow the "executable" of the target to be an indicator as to what the entry point is. The basic logic for converting the Target to the (entry point, files) is:

  • If there is one output, use that
  • otherwise, use whatever the executable file is
  • Otherwise, error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make it more complicated than that

I think you're referring to my "it should be a py library-like-equivalent dependency" comment?

There's a big distinction between "bundle of pre-built, platform-specific files" and a "dependency" (for lack of a better term). The former carries no information other than "here's a bag of files". A dependency can carry additional information, such as modifications to make to the import path, additional linker flags, propagating a proper runfiles object (we currently omit symlinks, for example), or other necessary metadata.

Yes, and in this particular case, so long as the target has its visibility restricted to the toolchain, a bag of pre-built platform-specific files is exactly what this is - the py_runtime rule won't pay any attention to additional providers from the target, and if it did honor e.g. imports from this input, I can't see a situation where that wouldn't do more harm than good. But, this is a very good reason to make sure the visibility is appropriately restricted.

"depend" is a strong word, but we do allow the "executable" of the target to be an indicator as to what the entry point is. The basic logic for converting the Target to the (entry point, files) is:

  • If there is one output, use that
  • otherwise, use whatever the executable file is
  • Otherwise, error

Yes, I do remember how I implemented that. But in this case I was referring to the fact that a filegroup with a single file in src will have a FilesToRun provider supplying that file as an executable, which would make the second option you list above there work, but we don't depend on that behavior (which is, admittedly, a little weird) because it'll stop at the first in this case.

At any rate, if a filegroup is what we want to use here, then the upstream docs in the bazel repo should maybe be updated? Except that there might be use cases for having it as a py_library in the general case; just in this particular case we don't need it to be and by using restricted visibility and a filegroup we can afford to not think about all of the ways that could go wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to use filegroup, so it should be hopefully an improvement over py_library and we can figure out something better as a separate PR.

@f0rmiga
Copy link
Collaborator

f0rmiga commented Jan 18, 2023

This is awesome! I was going to work on it this week. I did a pass on your code so far and will do a more thorough review today.

py_library(
name = "coverage",
srcs = ["coverage/__main__.py"],
data = glob(["coverage/*", "coverage/**/*.py", "coverage/*.so"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot after removing py_library, but:

  • .py files don't belong in data. Instead, a separate py_library should be factored out with proper srcs
  • The "coverage/*" glob is probably too general and overlaps with "*.so". In general, a "*" or "**" glob should only be done when you need every file in the source tree (e.g. some sort of source distribution zip), or if you have a data directory and you know it only contains files that should be included.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please take a look again.

srcs = ["coverage/__main__.py"],
data = glob(["coverage/*", "coverage/**/*.py", "coverage/*.so"]),
exec_compatible_with = {compatible_with},
visibility = ["//visibility:public"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public visibility? It's really only the toolchain that should be referencing it. It'd be rather hard to use it correctly because its just a group of platform-specific files.

Comment on lines 244 to 306
config_setting(
name = "coverage_enabled",
values = {{"collect_code_coverage": "true"}},
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh awesome, thank you for adding this!

Can you restrict visibility of this? It's not the py rule's responsibility to tell if coverage is enabled or not, so lets not expose a way for people to use it and think it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -255,13 +267,15 @@ py_runtime_pair(
python_path = python_bin,
python_version = python_short_version,
python_version_nodot = python_short_version.replace(".", ""),
coverage_tool = rctx.attr.coverage_tool if rctx.attr.coverage_tool == None and "windows" not in rctx.os.name else "\"{}\"".format(rctx.attr.coverage_tool),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic has a bug?

If tool=None and os=windows, then it'll do format(None), resulting in "None", which isn't a valid value.

Is this supposed to be "or" instead of "and" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there was a bug, fix it now.

python/repositories.bzl Show resolved Hide resolved
http_archive,
name = name,
build_file_content = """
py_library(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, sorry, I mixed up the py_runtime and PyRuntimeInfo apis. Disregard the "you have to specify two attributes" part. The splitting of the target into (entry point, files) is handled by py_runtime.

python/repositories.bzl Show resolved Hide resolved
@@ -1 +1,3 @@
common --experimental_enable_bzlmod

coverage --java_runtime_version=remotejdk_11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be because they don't have java installed on their dev machine, or at least not in a way that bazel is able to find it, and including it in the PR was accidental.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove it? I installed bazel via bazelisk and it does not pull in any java dependencies by default. What is the official way to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So according to this: https://bazel.build/docs/bazel-and-java#hermetic-testing

What this basically means is instead of using a locally installed JDK, it uses one it downloads (it sounds similar to how we're doing things).

So this is probably a net positive for getting started at the cost of having to download a JDK. I think that's worth it, so go ahead and leave it in.


# Update with './tools/update_coverage_deps.py <version>'
#START: managed by update_coverage_deps.py script
_coverage_deps = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hm. Something I just realized is we run the risk of conflicting with a user installed coverage library. e.g.
the toolchain uses coverage X
The user, in their requirements, uses coverage Y

At runtime, we'll have 2 different coverage libraries available, and it's not well defined which will be used.

I'm not sure what we can do about this, though. I guess add an arg to toolchain creation and dependency setup that allows the user to specify the coverage target to use, and skip having one automatically installed? Or...IDK. The key attribute we're looking for is that the toolchain and user binary reference the same target (or equiv; that we only end up with 1 coverage library in runfiles).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an argument in favor of using py_library as the target type in the remote for coverage_dep, so things other than the toolchain could use it.

I don't think there's a way around having the coverage tool used by the toolchain being potentially different from what's declared in requirements, because you can't actually do a pip install until the toolchain is configured. I don't think it's a dealbreaker, though; it might be a little surprising to find out that coverage is running with a version of coveragepy distinct from what was in your requirements.txt, but at the same time I don't think it'll actually interefere with anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, can you default register_coverage_deps=False? Otherwise we'll potentially cause issues with people who have coverage in their deps already, and we don't really have any solution or workaround available to them.

Yeah, we were kicking around ideas on how to best address this. I'm not sure there is much to be done for this in this PR. The only thing I can think of would be to change the register_coverage_tool arg from a bool to something that allows the caller to pass in their own target.

you can't actually do a pip install until the toolchain is configured.

Hmmm, well, what if you could? If we had a separate "pip tools" or "requirements resolution" toolchain, then maybe we'd be able to do that. Just thinking out loud.

The two more appealing ideas we came up with were:

  • Making sure the toolchains' coverage was last in the import order (probably by modifying the bootstrap, when the bootstrap_template arg becomes available). That would at least give well-defined order for which version gets used and lets the user have more control over the version used.
  • Have the toolchain only have the coverarge tool entry point, not the coverage library itself; each test is responsible for bringing the necessary deps for the coverage tool (and try to figure out how to make that as easy/transparent to test writers as possible)


@unittest.skipIf(%is_windows%, "not supported on Windows")
def test_coverage_toolchain(self):
subprocess.run("bazel coverage //...", shell=True, check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to test this? I was actually looking to remove this run_acceptance_test thing because it prevents upstream bazel from running our tests with upcoming bazel versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about that. I was using what was available, but if you have a suggestion on how to configure bazelci in order to run coverage, I would be happy to remove the changes in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just have to add coverage_targets: <targets> to the presubmit.yml?

https://github.com/bazelbuild/continuous-integration/blob/master/buildkite/README.md#running-bazel-coverage

But eh, I'm ok with doing that in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted modifications to this file and relying on the coverage_targets in the examples for now.

@@ -53,6 +53,15 @@ class TestPythonVersion(unittest.TestCase):

subprocess.run("bazel test //...", shell=True, check=True)

stream = os.popen("bazel info execution_root")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the subprocess module instead. Right now, this is leaking an opened file description, which spams the logs with a warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@unittest.skipIf(%is_windows%, "not supported on Windows")
def test_coverage_toolchain(self):
subprocess.run("bazel coverage //...", shell=True, check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just have to add coverage_targets: <targets> to the presubmit.yml?

https://github.com/bazelbuild/continuous-integration/blob/master/buildkite/README.md#running-bazel-coverage

But eh, I'm ok with doing that in a separate PR.

- Only set the tooling if it is actually needed.
- By default it is set to None.
The user can turn the coverage_tool setting by passing
`register_coverage_tool=(True|False)` to `python_register_toolchains` or
`python_register_multi_toolchains` call or specifying the
`coverage_tool` label as described in the `versions.bzl` file.

Update tests to:
- ensure that we can still use the toolchain as previously.
- ensure that we are not downloading extra deps if they are not needed.
This hopefully makes everything more watertight and will help
avoid bugs with misconfiguration.
The `__init__.py` files in the root of the WORKSPACE in `bzlmod` is
breaking, when running under `bazel coverage //:test`. However, it
started working when I renamed `__init__.py` to `lib.py`. I am
suspecting that this has to do with the fact that the layer of
indirection that `coverage` introduces could be something to do with
that.
The `multi_python_versions` example cannot show coverage for the more
complex tests that are using `subprocess`. I am wondering if this is
related to the fact that we are including `coverage.py` via the
toolchain and not through other mechanisms [2].

[2]: https://bazel.build/configure/coverage
I am not sure how I can translate the config constraint to a build
environment here.
The logic should be that we set the `coverage_tool` to None if it is
Windows or the tool has been not set.
This changes how we install the coverage dependencies and allows us to
set the visibility for the coverage tool correctly by passing the name
of the toolchain repo during the repository function execution.

bzlmod does not have this due to the fact that the dependency
registering is happening in a different way/order.
Now we store everything in a nested dict of tuples, which makes the
lookup of supported versions slightly cleaner.
We rely now on the bazelci config instead
@rickeylev rickeylev merged commit 488a037 into bazelbuild:main Jan 31, 2023
@rickeylev
Copy link
Contributor

btw, re: coverage of subprocesses not working: in the bootstrap, there's some code that unsets some coverage variables. I think keeping that set is necessary to propagate coverage things to subprocesses

@aignas aignas deleted the coverage branch February 13, 2023 08:12
ianpegg-bc pushed a commit to ianpegg-bc/rules_python that referenced this pull request May 12, 2023
…ld#977)

This allows including the coverage package as part of the toolchain dependencies, which is mixed into a test's dependencies when `bazel coverage` is run (if coverage is not enabled, no extra dependency is added)

For now, it's disabled by default because enabling it poses the risk of having two versions of coverage installed (one from the toolchain, one from the user's dependencies).

The user can turn the coverage_tool setting by passing
`register_coverage_tool=(True|False)` to `python_register_toolchains` or
`python_register_multi_toolchains` call or specifying the
`coverage_tool` label as described in the `versions.bzl` file.

Use coverage.py v6.5.0 because the latest has `types.py` in the package
directory, which imports from Python's stdlib `types` [1]. Somehow the
Python interpreter is thinking that the `from types import FrameType` is
referring to the currently interpreted file and everything breaks. I
would have expected the package to use absolute imports and only attempt
to import from `coverage.types` if we use `coverage.types` and not just
a plain `types` import.

NOTE: Coverage is only for non-windows platforms.

Update tests to:
- ensure that we can still use the toolchain as previously.
- ensure that we are not downloading extra deps if they are not needed.

* Also changes the projects bazelrc to use a remotejdk, which makes it easier for contributors because they don't have to locally install a jdk to get going.

[1]: https://github.com/nedbat/coveragepy/blob/master/coverage/types.py
[3]: bazelbuild/bazel#15835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants