Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase heuristic effort for optimization level 2 #12149

Merged
merged 7 commits into from
Apr 23, 2024

Conversation

mtreinish
Copy link
Member

Summary

This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from #12010 and #11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2.

Details and comments

Related to: #7112

This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.
@mtreinish mtreinish added the Changelog: API Change Include in the "Changed" section of the changelog label Apr 5, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Apr 5, 2024
@mtreinish mtreinish requested a review from a team as a code owner April 5, 2024 21:20
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented Apr 5, 2024

Pull Request Test Coverage Report for Build 8693815185

Details

  • 6 of 6 (100.0%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.005%) to 89.354%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.03%
crates/qasm2/src/lex.rs 6 92.11%
Totals Coverage Status
Change from base Build 8689749566: -0.005%
Covered Lines: 60163
Relevant Lines: 67331

💛 - Coveralls

For the initial VF2Layout call this commit expands the vf2 call limit
back to the previous level instead of reducing it to the same as level 1.
The idea behind making this change is that spending up to 10s to find a
perfect layout is a worthwhile tradeoff as that will greatly improve the
result from execution. But scoring multiple layouts to find the lowest
error rate subgraph has a diminishing margin of return in most cases as
there typically aren't thousands of unique subgraphs and often when we
hit the scoring limit it's just permuting the qubits inside a subgraph
which doesn't provide the most value.

For VF2PostLayout the lower call limits from level 1 is still used. This
is because both the search for isomorphic subgraphs is typically much
shorter with the vf2++ node ordering heuristic so we don't need to spend
as much time looking for alternative subgraphs.
Due to potential instability in the 2q peephole optimization we run we
were using the `MinimumPoint` pass to provide backtracking when we reach
a local minimum. However, this pass adds a significant amount of
overhead because it deep copies the circuit at every iteration of the
optimization loop that improves the output quality. This commit tweaks
the O2 pass manager construction to only run 2q peephole once, and then
updates the optimization loop to be what the previous O2 optimization
loop was.
@mtreinish
Copy link
Member Author

mtreinish commented Apr 15, 2024

I ran the "utility scale" asv benchmarks with this PR and got the following results:

Benchmarks that have improved:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
-      1.37±0.03s       1.22±0.01s     0.89  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')
-            1896             1618     0.85  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')
-            1908             1607     0.84  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')
-            1936             1622     0.84  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')

Benchmarks that have stayed the same:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
       21.1±0.06s       25.3±0.02s    ~1.20  utility_scale.UtilityScaleBenchmarks.time_qft('ecr')
          2.18±0s       2.32±0.01s     1.06  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')
       24.3±0.06s       25.7±0.02s     1.06  utility_scale.UtilityScaleBenchmarks.time_qft('cz')
       29.8±0.2ms       30.7±0.2ms     1.03  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')
       18.6±0.06s       19.1±0.06s     1.03  utility_scale.UtilityScaleBenchmarks.time_qft('cx')
       92.3±0.8ms       94.5±0.4ms     1.02  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')
      29.7±0.08ms       30.2±0.1ms     1.02  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')
         92.9±2ms         94.2±1ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')
       92.7±0.7ms       94.0±0.3ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')
      30.0±0.04ms       30.2±0.4ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')
      8.56±0.04ms      8.60±0.04ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')
      8.64±0.08ms      8.65±0.02ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')
       8.59±0.1ms      8.60±0.04ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')
             2598             2582     0.99  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')
             2598             2582     0.99  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')
             2598             2496     0.96  utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')
       3.11±0.01s       2.69±0.03s    ~0.87  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')

Benchmarks that have got worse:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
+      2.78±0.04s          5.32±0s     1.92  utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')
+         852±8ms       1.08±0.01s     1.27  utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')
+      5.12±0.01s          6.45±0s     1.26  utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

It's about what I expected. Improvements in quality since we're ramping up heurstic effort in a bunch of places but also at the cost of runtime. The ecr qaoa runtime benchark is a bit more severe than I expected, but the ~25% slower is about what I was expecting.

The only other change I really want to do is building off of #12171 I'd like to change the default sabre heuristic we use in level 2 to lookahead instead of decay. Not to save any runtime, I just expect it to produce better output.

I'm also curious to try dropping the sabre trial counts back down to 10 and see where the tradeoffs are there. I'll rerun the benchmarks and see what they look like with that change. The only place the runtime change can realistically come from is in sabre or by running 2q peephole. It didn't make any difference, if anything it was slower (because of more swaps inserted).

Copy link
Member

@levbishop levbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there'll be more tweaks to O2 as we speed up and streamline other passes, add sabre heuristics, etc, but this is a solid starting point

@levbishop levbishop added this pull request to the merge queue Apr 23, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 23, 2024
@levbishop levbishop added this pull request to the merge queue Apr 23, 2024
Merged via the queue into Qiskit:main with commit 40ac274 Apr 23, 2024
12 checks passed
@mtreinish mtreinish deleted the level-2-v2 branch April 23, 2024 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: API Change Include in the "Changed" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants