Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move and enable visual tests #9961

Merged
merged 50 commits into from
Jul 11, 2023
Merged

move and enable visual tests #9961

merged 50 commits into from
Jul 11, 2023

Conversation

AngeloDanducci
Copy link
Contributor

@AngeloDanducci AngeloDanducci commented Apr 13, 2023

Summary

Moved and enabled the visual tests

Details and comments

Need to verify artifact publishing occurs on failure/missing, may need to write some purposeful failures.

Fixes #9892

@qiskit-bot qiskit-bot added the Community PR PRs from contributors that are not 'members' of the Qiskit repo label Apr 13, 2023
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@CLAassistant
Copy link

CLAassistant commented Apr 13, 2023

CLA assistant check
All committers have signed the CLA.

@1ucian0 1ucian0 self-assigned this Apr 13, 2023
@coveralls
Copy link

coveralls commented Apr 13, 2023

Pull Request Test Coverage Report for Build 5522051947

  • 0 of 1 (0.0%) changed or added relevant line in 1 file are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.01%) to 85.994%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/visualization/state_visualization.py 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 1 90.39%
crates/qasm2/src/lex.rs 4 91.14%
Totals Coverage Status
Change from base Build 5520034869: 0.01%
Covered Lines: 71659
Relevant Lines: 83330

💛 - Coveralls

@AngeloDanducci AngeloDanducci marked this pull request as ready for review April 25, 2023 03:00
@AngeloDanducci AngeloDanducci requested a review from a team as a code owner April 25, 2023 03:00
@1ucian0 1ucian0 added type: qa Issues and PRs that relate to testing and code quality Changelog: None Do not include in changelog labels May 11, 2023
test/visual/results.py Outdated Show resolved Hide resolved
@AngeloDanducci
Copy link
Contributor Author

Moved all of the failures to one spot so there should be less friction for artifact creation and readability. Looks like it might need an approval to run the tests now?

Once the new artifact creation is tested in the pipeline we can revert and should be good to merge.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this. I think the Azure run actually just got lost in the webhooks aether - it happens sometimes. The "authorisation required" thing is about a couple of the tests that run on GitHub Actions, and won't actually be affected by these changes at all. Pushing a new commit to this branch should retrigger the webhook to get Azure going again.

It'll definitely be good to get these tests running again in CI. I haven't been paying attention through the process of developing this PR, but just to check: have you seen any flakiness at all in any of the tests? I don't know if there's any stochastic elements to any of the visualisations, but hopefully not.

.azure/test-linux.yml Outdated Show resolved Hide resolved
.azure/test-linux.yml Outdated Show resolved Hide resolved
.azure/test-visual.yml Outdated Show resolved Hide resolved
azure-pipelines.yml Outdated Show resolved Hide resolved
qiskit/visualization/state_visualization.py Outdated Show resolved Hide resolved
test/visual/mpl/circuit/test_circuit_matplotlib_drawer.py Outdated Show resolved Hide resolved
Comment on lines 90 to 96
@staticmethod
def _image_path(image_name):
return os.path.join(RESULTDIR, image_name)

@staticmethod
def _reference_path(image_name):
return os.path.join(TEST_REFERENCE_PATH, image_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd roughly prefer these just to be inlined into the call sites with the constants becoming pathlib.Path instances. It'd be the same or less typing, even at the point of call:

self._reference_path("my_image.png")  # current
TEST_REFERENCE_PATH / "my_image.png"  # alternate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chose not to inline these for the moment - but I can see the reasoning.

test/visual/mpl/circuit/test_circuit_matplotlib_drawer.py Outdated Show resolved Hide resolved
Comment on lines 333 to 334
ratio = self._similarity_ratio(self._image_path(fname), self._reference_path(fname), fname)
assert ratio == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using this exact same form everywhere (and it looks like we are), perhaps we could factor it out into a

def assertImageSimilarity(self, name: str, similarity: int)

method? Looking at what self._similarity_ratio does, I'd expect some of its work to be about handling the assertion. We already effectively require that our test images and the references have the same file name, so it doesn't seem too disruptive to factor that out into the assertion method.

At the least, here and everywhere, please can we use unittest assertions rather than bare assert, so we can see the full error messages in the logs if there's a failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were aiming to have the assertions fire only after all the tests ran, I'd suggest using ddt to parametrise the multi-circuit tests so there's only one assertion, or use with self.subTest to make internally separate test contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the unittest assertEqual.

Left as an assert after the fact, after changing _similarity_ratio to _save_diff I think it makes sense that the function saves the diff and returns how different it is.

test/visual/mpl/graph/test_graph_matplotlib_drawer.py Outdated Show resolved Hide resolved
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for sticking with this. The change to from similarity_ratio to save_diff does help a lot with legibility, thanks. It's still maybe not ideal in terms of the ratio between a magic number shared between two places, but it's not the end of the world.

I pushed up a minor couple of commits to tidy up a file permission and remove old documentation references to Binder (that were a bit useless anyway). Thanks for sticking through this to the end!

@jakelishman jakelishman added this pull request to the merge queue Jul 11, 2023
Merged via the queue into Qiskit:main with commit fb9d5d8 Jul 11, 2023
13 checks passed
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 12, 2023
In the recently merged Qiskit#9961 the visual comparison tests were re-added
to run a visual diff on the visualization output against a known
reference image. These improved tests provide us missing coverage and
have already found potential bugs in proposed changes, see:

Qiskit#10208 (comment)

However, the comparison were looking for a 100% match between the
reference image and the generated output. In an ideal world we'd be able
to rely on this. But in practice an exact comparisons of the images are
quite flaky. This is because the actual visualization output is a
function of not only our python code in Qiskit, but also upstream python
dependencies (such as matplotlib, seaborn, etc) and more importantly
those libraries leverage many system libraries and packages. This
includes the C libraries for image formats (e.g. libpng), but also
things like fonts. A version bump in our CI image's package
versions can cause subtle differences in the output of visualizations.
The CI images are controlled by Microsoft/Github and isn't something
we we can't really depend on being constant forever (even if we used
docker for a base test image the upstream images we would build off of
that aren't 100% fixed either and could cause similar issues). We've had
numerous issues with this in the past with image comparison tests
(especially when you start running them cross-platform which isn't the
case here) and is why we've oscillated between having them in the test
suite and not throughout the development history of Qiskit.

This commit adds a 1% tolerance to the comparison ratio so instead of
needing  a 100% match a 99% match is good enough. While this does
technically reduce the coverage and a potentially invalid image could
slip in that 1% difference, the chance of that happening are fairly
unlikely especially weighed against the likelihood of a system change
causing a CI blockage (which happened within one day of merging Qiskit#9961).
The only other option is to disable these tests as voting in the CI job,
which would all but remove their utility because they will likely go
stale (as the testing harness before this did). We may still end up
making them non-voting anyway despite the coverage gains if we can't
reliably run the tests in CI.
github-merge-queue bot pushed a commit that referenced this pull request Jul 13, 2023
In the recently merged #9961 the visual comparison tests were re-added
to run a visual diff on the visualization output against a known
reference image. These improved tests provide us missing coverage and
have already found potential bugs in proposed changes, see:

#10208 (comment)

However, the comparison were looking for a 100% match between the
reference image and the generated output. In an ideal world we'd be able
to rely on this. But in practice an exact comparisons of the images are
quite flaky. This is because the actual visualization output is a
function of not only our python code in Qiskit, but also upstream python
dependencies (such as matplotlib, seaborn, etc) and more importantly
those libraries leverage many system libraries and packages. This
includes the C libraries for image formats (e.g. libpng), but also
things like fonts. A version bump in our CI image's package
versions can cause subtle differences in the output of visualizations.
The CI images are controlled by Microsoft/Github and isn't something
we we can't really depend on being constant forever (even if we used
docker for a base test image the upstream images we would build off of
that aren't 100% fixed either and could cause similar issues). We've had
numerous issues with this in the past with image comparison tests
(especially when you start running them cross-platform which isn't the
case here) and is why we've oscillated between having them in the test
suite and not throughout the development history of Qiskit.

This commit adds a 1% tolerance to the comparison ratio so instead of
needing  a 100% match a 99% match is good enough. While this does
technically reduce the coverage and a potentially invalid image could
slip in that 1% difference, the chance of that happening are fairly
unlikely especially weighed against the likelihood of a system change
causing a CI blockage (which happened within one day of merging #9961).
The only other option is to disable these tests as voting in the CI job,
which would all but remove their utility because they will likely go
stale (as the testing harness before this did). We may still end up
making them non-voting anyway despite the coverage gains if we can't
reliably run the tests in CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog Community PR PRs from contributors that are not 'members' of the Qiskit repo type: qa Issues and PRs that relate to testing and code quality
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

image comparison to CI
6 participants