Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain and access unprocessed coverage data? #21

Open
ferdnyc opened this issue Jun 23, 2024 · 2 comments
Open

Retain and access unprocessed coverage data? #21

ferdnyc opened this issue Jun 23, 2024 · 2 comments

Comments

@ferdnyc
Copy link

ferdnyc commented Jun 23, 2024

Whenever unittest-parallel is called with --coverage or the other coverage-related arguments it accepts, it will generate coverage data for each of the parallel jobs into a temporary directory, then automatically combine them, report the coverage stats, and (optionally) generate a detailed report in the format(s) requested.

But what if I just want the raw coverage data, unprocessed and un-combined?

For example, in one project I'm using unittest-parallel in combination with tox and tox-gh to execute tests under a range of Python versions and OSes in a GitHub Actions CI workflow.

To collect complete coverage, the data from all of the CI jobs in the workflow matrix needs to be combined in a separate workflow job, after all of the test runs are completed.

Coverage.py's coverage run command even has an argument (-p) that facilitates this, by changing the default name of the raw sqlite data file from .coverage to .coverage.$HOSTNAME.$PID.$RANDOM so that the filenames won't collide when the data is aggregated.

It would be helpful if unittest-parallel provided an option that similarly disabled the implicit coverage combine and coverage report steps (or their moral equivalents via the Python API), and instead of automatically deleting the raw coverage data files, made them available at the end of the run for further aggregation.

@ferdnyc
Copy link
Author

ferdnyc commented Jun 23, 2024

In fact, looking at the coverage.py source code...

  1. coverage.Coverage() can be passed a data_suffix=True argument, instead of the data_file argument, to switch on -p mode for the default output filename (changing it from .coverage to .coverage.unique-identifier)
  2. coverage.combine() will by default aggregate all of the data files matching its base filename or default, if not passed a list of filenames
  3. coverage.combine() also automatically deletes all of the files it combines, unless it's also passed a keep=True argument to prevent that deletion behavior.

So, rather than using the temporary directory and generating its own temporary filenames within it for the coverage data, it might be easier and more flexible if unittest-parallel just passed coverage.Coverage() the data_suffix=True argument to get unique filenames for each job in the current directory (which is coverage's standard behavior), or in another location if specified by the user. And then used cov.combine(None) instead of supplying a filename list, to let coverage process (and delete) all of those default-named files.

A new commandline argument (--coverage-preserve or something, maybe) could then be used to add keep=True to the combine() call, so that the raw files wouldn't be deleted and would remain available after the run.

@craigahobbs
Copy link
Owner

Thanks for the suggestion. I think this will work and simplify things. I don't remember why I used the temporary directory anymore. I'll do some experimentation and, if it looks good, will push a branch with the new option so you can try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants