Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce flaky tests #1015

Merged
merged 15 commits into from
Dec 5, 2024
Merged

reduce flaky tests #1015

merged 15 commits into from
Dec 5, 2024

Conversation

josephleekl
Copy link
Contributor

@josephleekl josephleekl commented Dec 2, 2024

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:
The aim is to reduce the number of flaky tests, and deal with some of the stochastic tests failures observed in CI.

Description of the Change:
4 tests are updated:

  1. test_shots_single_measure_obs (TSSMO)
    Previously flaky test, flaky now removed and shot increased 10x. Previous failure due to low shots count.
    Previous observed failure rate (without flaky): LQ 4/1000
    Updated failure rate: LQ 0/1000

  2. test_controlled_qubit_gates (TCQG)
    Previously flaky test, flaky now removed. There is no non-determinism in test.
    No failure.

  3. test_cnot_controlled_qubit_unitary (TCCQU)
    Previously flaky test, flaky now removed. Only non-determinism in initial state preparation. Initial state now fixed with seed (already used in above test).
    No failure.

  4. test_sample_variations (TSV)
    Shots increased 10x and now reference compares with analytical probability calculation in default.qubit rather than with shots.
    Previous observed failure rate: 13/1000
    Updated failure rate: LQ 0/1000

Benefits:

  • Reduced flaky tests from 6 to 3
  • Reduce likelihood of TSV causing CI failure, which is frequently observed (1 2 3 4)

Possible Drawbacks:
Increased test runtime:

  1. TSSMO
    LQ x86: 5s -> 16s
    LG x86: 3s -> 6s
    LK x86 CPU: 1s -> 4s
    LK GPU: 1s -> 4s

  2. TSV
    LQ x86: 3s -> 5s
    LG x86: 6s -> 10s
    LK x86 CPU: 1s -> 4s
    LK GPU: 1s -> 4s

However, originally, for TSSMO the time increases if flaky is required to be triggered, and TSV failure would require re-running the tests.

Related GitHub Issues:

[sc-78212]

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.90%. Comparing base (705fe89) to head (8e46a9e).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (705fe89) and HEAD (8e46a9e). Click for more details.

HEAD has 24 uploads less than BASE
Flag BASE (705fe89) HEAD (8e46a9e)
32 8
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1015      +/-   ##
==========================================
- Coverage   97.90%   91.90%   -6.01%     
==========================================
  Files         230      109     -121     
  Lines       39531    16755   -22776     
==========================================
- Hits        38704    15399   -23305     
- Misses        827     1356     +529     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@josephleekl josephleekl marked this pull request as ready for review December 4, 2024 22:22
@josephleekl josephleekl added urgent Mark a pull request as high priority ci:use-gpu-runner Enable usage of GPU runner for this Pull Request labels Dec 4, 2024
Copy link
Member

@multiphaseCFD multiphaseCFD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @josephleekl ! Could you please clarify why flaky tests can be reduced with this implementation?

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Happy to approve 🙌

tests/test_measurements.py Show resolved Hide resolved
@josephleekl josephleekl added the ci:build_wheels Activate wheel building. label Dec 5, 2024
@josephleekl
Copy link
Contributor Author

LGTM! Thanks @josephleekl ! Could you please clarify why flaky tests can be reduced with this implementation?

Thanks Shuli, the shots increase significantly reduces the failure rate, which means that we should no longer need flaky tests for some tests. ALso some tests are not non-deterministic so no flaky is required.

@josephleekl josephleekl merged commit ded6ef9 into master Dec 5, 2024
72 of 73 checks passed
@josephleekl josephleekl deleted the fix_reduce_flaky branch December 5, 2024 16:02
josephleekl added a commit that referenced this pull request Dec 5, 2024
Please complete the following checklist when submitting a PR:

- [ ] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [ ] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [ ] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.

------------------------------------------------------------------------------------------------------------

**Context:**
The aim is to reduce the number of flaky tests, and deal with some of
the stochastic tests failures observed in CI.

**Description of the Change:**
4 tests are updated:
1. test_shots_single_measure_obs
([TSSMO](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/test_measurements.py#L753))
Previously flaky test, flaky now removed and shot increased 10x.
Previous failure due to low shots count.
Previous observed failure rate (without flaky): LQ 4/1000
Updated failure rate: LQ 0/1000

2. test_controlled_qubit_gates
([TCQG](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/lightning_qubit/test_measurements_class.py#L742))
Previously flaky test, flaky now removed. There is no non-determinism in
test.
No failure.

3. test_cnot_controlled_qubit_unitary
([TCCQU](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/lightning_qubit/test_measurements_class.py#L825))
Previously flaky test, flaky now removed. Only non-determinism in
initial state preparation. Initial state now fixed with seed (already
used in above test).
No failure.

4. test_sample_variations
([TSV](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/test_measurements.py#L661))
Shots increased 10x and now reference compares with analytical
probability calculation in default.qubit rather than with shots.
Previous observed failure rate: 13/1000
Updated failure rate: LQ 0/1000

**Benefits:**
- Reduced flaky tests from 6 to 3
- Reduce likelihood of TSV causing CI failure, which is frequently
observed
([1](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/10887374615/job/30210122673)
[2](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/11582353424/job/32245407148)
[3](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/12131635773/job/33824313065)
[4](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/12150423891/job/33887968196))

**Possible Drawbacks:**
Increased test runtime:
1. TSSMO
LQ x86: 5s -> 16s
LG x86: 3s -> 6s
LK x86 CPU: 1s -> 4s
LK GPU: 1s -> 4s

2. TSV
LQ x86: 3s -> 5s
LG x86: 6s -> 10s
LK x86 CPU: 1s -> 4s
LK GPU: 1s -> 4s

However, originally, for TSSMO the time increases if flaky is required
to be triggered, and TSV failure would require re-running the tests.

**Related GitHub Issues:**

[sc-78212]

---------

Co-authored-by: ringo-but-quantum <github-ringo-but-quantum@xanadu.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:build_wheels Activate wheel building. ci:use-gpu-runner Enable usage of GPU runner for this Pull Request urgent Mark a pull request as high priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants