reduce flaky tests #1015

josephleekl · 2024-12-02T17:27:26Z

Before submitting

Please complete the following checklist when submitting a PR:

All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests directory!
All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running make docs.
Ensure that the test suite passes, by running make test.
Add a new entry to the .github/CHANGELOG.md file, summarizing the
change, and including a link back to the PR.
Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.

Context:
The aim is to reduce the number of flaky tests, and deal with some of the stochastic tests failures observed in CI.

Description of the Change:
4 tests are updated:

test_shots_single_measure_obs (TSSMO)
Previously flaky test, flaky now removed and shot increased 10x. Previous failure due to low shots count.
Previous observed failure rate (without flaky): LQ 4/1000
Updated failure rate: LQ 0/1000
test_controlled_qubit_gates (TCQG)
Previously flaky test, flaky now removed. There is no non-determinism in test.
No failure.
test_cnot_controlled_qubit_unitary (TCCQU)
Previously flaky test, flaky now removed. Only non-determinism in initial state preparation. Initial state now fixed with seed (already used in above test).
No failure.
test_sample_variations (TSV)
Shots increased 10x and now reference compares with analytical probability calculation in default.qubit rather than with shots.
Previous observed failure rate: 13/1000
Updated failure rate: LQ 0/1000

Benefits:

Reduced flaky tests from 6 to 3
Reduce likelihood of TSV causing CI failure, which is frequently observed (1 2 3 4)

Possible Drawbacks:
Increased test runtime:

TSSMO
LQ x86: 5s -> 16s
LG x86: 3s -> 6s
LK x86 CPU: 1s -> 4s
LK GPU: 1s -> 4s
TSV
LQ x86: 3s -> 5s
LG x86: 6s -> 10s
LK x86 CPU: 1s -> 4s
LK GPU: 1s -> 4s

However, originally, for TSSMO the time increases if flaky is required to be triggered, and TSV failure would require re-running the tests.

Related GitHub Issues:

[sc-78212]

codecov · 2024-12-02T17:29:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.90%. Comparing base (705fe89) to head (8e46a9e).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (705fe89) and HEAD (8e46a9e). Click for more details.

HEAD has 24 uploads less than BASE

Flag BASE (705fe89) HEAD (8e46a9e)

32 8

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1015      +/-   ##
==========================================
- Coverage   97.90%   91.90%   -6.01%     
==========================================
  Files         230      109     -121     
  Lines       39531    16755   -22776     
==========================================
- Hits        38704    15399   -23305     
- Misses        827     1356     +529

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/lightning_qubit/test_measurements_class.py

multiphaseCFD

LGTM! Thanks @josephleekl ! Could you please clarify why flaky tests can be reduced with this implementation?

maliasadi

LGTM! Happy to approve 🙌

tests/test_measurements.py

josephleekl · 2024-12-05T16:01:19Z

LGTM! Thanks @josephleekl ! Could you please clarify why flaky tests can be reduced with this implementation?

Thanks Shuli, the shots increase significantly reduces the failure rate, which means that we should no longer need flaky tests for some tests. ALso some tests are not non-deterministic so no flaky is required.

Please complete the following checklist when submitting a PR: - [ ] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the [`tests`](../tests) directory! - [ ] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [ ] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing the change, and including a link back to the PR. - [x] Ensure that code is properly formatted by running `make format`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** The aim is to reduce the number of flaky tests, and deal with some of the stochastic tests failures observed in CI. **Description of the Change:** 4 tests are updated: 1. test_shots_single_measure_obs ([TSSMO](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/test_measurements.py#L753)) Previously flaky test, flaky now removed and shot increased 10x. Previous failure due to low shots count. Previous observed failure rate (without flaky): LQ 4/1000 Updated failure rate: LQ 0/1000 2. test_controlled_qubit_gates ([TCQG](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/lightning_qubit/test_measurements_class.py#L742)) Previously flaky test, flaky now removed. There is no non-determinism in test. No failure. 3. test_cnot_controlled_qubit_unitary ([TCCQU](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/lightning_qubit/test_measurements_class.py#L825)) Previously flaky test, flaky now removed. Only non-determinism in initial state preparation. Initial state now fixed with seed (already used in above test). No failure. 4. test_sample_variations ([TSV](https://github.com/PennyLaneAI/pennylane-lightning/blob/f9e8f62a073ab72c8d96b3bb01de399d02e288ba/tests/test_measurements.py#L661)) Shots increased 10x and now reference compares with analytical probability calculation in default.qubit rather than with shots. Previous observed failure rate: 13/1000 Updated failure rate: LQ 0/1000 **Benefits:** - Reduced flaky tests from 6 to 3 - Reduce likelihood of TSV causing CI failure, which is frequently observed ([1](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/10887374615/job/30210122673) [2](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/11582353424/job/32245407148) [3](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/12131635773/job/33824313065) [4](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/12150423891/job/33887968196)) **Possible Drawbacks:** Increased test runtime: 1. TSSMO LQ x86: 5s -> 16s LG x86: 3s -> 6s LK x86 CPU: 1s -> 4s LK GPU: 1s -> 4s 2. TSV LQ x86: 3s -> 5s LG x86: 6s -> 10s LK x86 CPU: 1s -> 4s LK GPU: 1s -> 4s However, originally, for TSSMO the time increases if flaky is required to be triggered, and TSV failure would require re-running the tests. **Related GitHub Issues:** [sc-78212] --------- Co-authored-by: ringo-but-quantum <github-ringo-but-quantum@xanadu.ai>

josephleekl and others added 2 commits December 2, 2024 10:23

remove flaky TCCQU/TCQG

b8b1403

Auto update version from '0.40.0-dev23' to '0.40.0-dev24'

fed60e8

josephleekl and others added 11 commits December 4, 2024 11:06

add single return expval test

bc9351c

Merge branch 'master' into fix_reduce_flaky

3d665b2

update test sample variations

c76650b

Auto update version from '0.40.0-dev26' to '0.40.0-dev27'

ceacb94

update test sample variations

f6852b8

increase TSSMO shot count and remove flaky

17f1f85

Auto update version from '0.40.0-dev27' to '0.40.0-dev28'

f81935e

revert TSRV

0a0aefd

remove unused import

007f891

TSSMO remove flaky

5061b43

update changelog

cdd4c4b

josephleekl marked this pull request as ready for review December 4, 2024 22:22

josephleekl added urgent Mark a pull request as high priority ci:use-gpu-runner Enable usage of GPU runner for this Pull Request labels Dec 4, 2024

josephleekl and others added 2 commits December 4, 2024 22:24

Merge branch 'master' into fix_reduce_flaky

0bcb5a0

Auto update version from '0.40.0-dev28' to '0.40.0-dev29'

8e46a9e

josephleekl commented Dec 4, 2024

View reviewed changes

tests/lightning_qubit/test_measurements_class.py Show resolved Hide resolved

josephleekl commented Dec 4, 2024

View reviewed changes

tests/lightning_qubit/test_measurements_class.py Show resolved Hide resolved

multiphaseCFD approved these changes Dec 4, 2024

View reviewed changes

maliasadi approved these changes Dec 5, 2024

View reviewed changes

tests/test_measurements.py Show resolved Hide resolved

josephleekl added the ci:build_wheels Activate wheel building. label Dec 5, 2024

josephleekl merged commit ded6ef9 into master Dec 5, 2024
72 of 73 checks passed

josephleekl deleted the fix_reduce_flaky branch December 5, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce flaky tests #1015

reduce flaky tests #1015

josephleekl commented Dec 2, 2024 •

edited

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading

multiphaseCFD left a comment

maliasadi left a comment

josephleekl commented Dec 5, 2024

reduce flaky tests #1015

reduce flaky tests #1015

Conversation

josephleekl commented Dec 2, 2024 • edited Loading

Before submitting

codecov bot commented Dec 2, 2024 • edited Loading

Codecov Report

multiphaseCFD left a comment

Choose a reason for hiding this comment

maliasadi left a comment

Choose a reason for hiding this comment

josephleekl commented Dec 5, 2024

josephleekl commented Dec 2, 2024 •

edited

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading