Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove recursion from ConstrainedReschedule pass #10051

Merged
merged 4 commits into from
May 2, 2023

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Apr 29, 2023

Summary

The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit.

However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up is rustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timing offsets. I tried this in bd3cbb2 and it was significantly slower. We'll have to investigate a different algorithmic approach for adjusting the time that doesn't require multiple iterations like the current approach.

Details and comments

Fixes #10049

The ConstrainedReschedule pass previosuly was using a recursive depth
first traversal to push back overlapping gates after aligning
operations. This however would cause a failure for a sufficiently large
circuit when the recursion depth could potentially exceed the maximum
stack depth allowed in python. To address this, this commit rewrites the
depth first traversal to be iterative instead of recursive. This removes
the stack depth limitation and should let the pass run with any size
circuit.

However, the performance of this pass is poor for large circuits. One
thing we can look at using to try and speed it up is rustworkx's
dfs_search() function which will let us shift the traversal to rust and
call back to python to do the timing offsets. If this is insufficient
we'll have to investigate a different algorithm for adjusting the time
that doesn't require multiple iterations like the current approach.

Fixes Qiskit#10049
@mtreinish mtreinish added the Changelog: Bugfix Include in the "Fixed" section of the changelog label Apr 29, 2023
@mtreinish mtreinish added this to the 0.24.0 milestone Apr 29, 2023
@mtreinish mtreinish requested a review from a team as a code owner April 29, 2023 18:11
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@mtreinish mtreinish changed the title Remove recurssion from ConstrainedReschedule pass Remove recursion from ConstrainedReschedule pass Apr 29, 2023
@coveralls
Copy link

coveralls commented Apr 29, 2023

Pull Request Test Coverage Report for Build 4860848848

  • 41 of 41 (100.0%) changed or added relevant lines in 1 file are covered.
  • 29 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.02%) to 85.913%

Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 1 95.14%
crates/qasm2/src/lex.rs 2 91.65%
qiskit/extensions/quantum_initializer/squ.py 2 80.0%
crates/qasm2/src/parse.rs 24 96.18%
Totals Coverage Status
Change from base Build 4829590605: -0.02%
Covered Lines: 71176
Relevant Lines: 82847

💛 - Coveralls

@alexanderivrii
Copy link
Contributor

So far I have only taken a quick look at the old and the new code and was about to comment that I did not see "visited node tracking" in the old code, and it's already removed from the new code. Modulo that, it seems quite a straightforward way to convert a recursive approach into a stack-based approach (but I would like to look at the code more carefully). Hmm, because there is no visited node tracking, it seems that some nodes might be examined multiple times, leading to a potentially quadratic complexity, does this make sense? That is, we don't have cycles but we still have reconvergent paths. Would a more bfs-like approach make the complexity linear?

@mtreinish
Copy link
Member Author

Yeah, I'm not sure about the visited node handling. I've been oscillating on whether to include it or not. I removed it earlier today because the unit tests didn't cover the lines so I briefly thought it wasn't necessary. But then I wasn't sure and added it back. As for using a BFS approach instead I wasn't sure, I assumed the original pass used a DFS for a reason and tried to replicate it as closely as I could and just remove the recursion. But, maybe @nkanazawa1989 has more insights here as to the rationale behind the implementation of the pass.

This commit rewrites the pass to leverage rustworkx's dfs_search
function which provides a way to have rustworkx traverse the graph in a
depth first manner and then provides hook points to execute code at
different named portions of the DFS. By leveraging this function we're
able to speed up the search by leveraging rust to perform the actual
graph traversal.
This made performance of the pass worse so reverting this for now. We
can investigate this at a later date.

This reverts commit bd3cbb2.
@nkanazawa1989
Copy link
Contributor

Seems like this is okey for practical cases, e.g. T1 experiment with variable delay, but maybe we can consider some edge cases. Some node may be shifted once due to qreg overlap with certain node, then it might be shifted again by creg overlap with different node. I cannot write good test case immediately, but I'm curious if visited node logic works as expected in this situation.

@alexanderivrii
Copy link
Contributor

I have experimented a bit with both implementations, and so far could not find a single example where if node in visited evaluates to True (but I also don't see a reason why this should always be the case). But I have not tried @nkanazawa1989's suggestion above.

Note that the main loop in run iterates over the nodes in topological order, so it's easy to find examples which demonstrate quadratic behavior (with some nodes updating multiple times), e.g.:


def experiment1():
    durations = InstructionDurations(
        [
            ("cx", (0, 1), 10),
        ])

    pm = PassManager(
        [
            ASAPScheduleAnalysis(durations),
            ConstrainedReschedule(pulse_alignment=100),
        ]
    )

    qc = QuantumCircuit(2)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)
    qc.cx(0, 1)

    pm.run(qc)

I think that simply removing the "visited" logic should be equivalent to the old code (including the possibility that the same node might appear in shift_stack multiple times, and hence be updated multiple times), so for an immediate patch that version should be correct. For a longer-term solution someone should probably take another look at the code, trying to see if it can be made linear rather than quadratic.

@mtreinish
Copy link
Member Author

Ok, since my goal for this PR was to keep the behavior the same and just remove the recursion I'll remove the visited logic again to keep it exactly the same. I fully agree we need to revisit the logic in this pass for performance/scaling because it's quite slow, the reproduce example I wrote in #10049 takes ~20min to run this pass with this PR (which is better than a recursion error but still not great) but given the release crunch we can do that for 0.25.0.

This commit removes the visited node check and skip logic from the DFS
traversal. To ensure this code behaves identically to the recursive
version before this PR this logic is removed because there wasn't a
similar check in that version.
@mtreinish
Copy link
Member Author

Ok, I removed the visited node check in: d45555d

Copy link
Contributor

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@mtreinish mtreinish added stable backport potential The bug might be minimal and/or import enough to be port to stable priority: high labels May 2, 2023
@alexanderivrii alexanderivrii added this pull request to the merge queue May 2, 2023
Merged via the queue into Qiskit:main with commit 112bd6e May 2, 2023
mergify bot pushed a commit that referenced this pull request May 2, 2023
* Remove recurssion from ConstrainedReschedule pass

The ConstrainedReschedule pass previosuly was using a recursive depth
first traversal to push back overlapping gates after aligning
operations. This however would cause a failure for a sufficiently large
circuit when the recursion depth could potentially exceed the maximum
stack depth allowed in python. To address this, this commit rewrites the
depth first traversal to be iterative instead of recursive. This removes
the stack depth limitation and should let the pass run with any size
circuit.

However, the performance of this pass is poor for large circuits. One
thing we can look at using to try and speed it up is rustworkx's
dfs_search() function which will let us shift the traversal to rust and
call back to python to do the timing offsets. If this is insufficient
we'll have to investigate a different algorithm for adjusting the time
that doesn't require multiple iterations like the current approach.

Fixes #10049

* Use rustworkx's dfs_search instead of manual dfs implementation

This commit rewrites the pass to leverage rustworkx's dfs_search
function which provides a way to have rustworkx traverse the graph in a
depth first manner and then provides hook points to execute code at
different named portions of the DFS. By leveraging this function we're
able to speed up the search by leveraging rust to perform the actual
graph traversal.

* Revert "Use rustworkx's dfs_search instead of manual dfs implementation"

This made performance of the pass worse so reverting this for now. We
can investigate this at a later date.

This reverts commit bd3cbb2.

* Remove visited node check from DFS

This commit removes the visited node check and skip logic from the DFS
traversal. To ensure this code behaves identically to the recursive
version before this PR this logic is removed because there wasn't a
similar check in that version.

(cherry picked from commit 112bd6e)
mtreinish added a commit that referenced this pull request May 2, 2023
* Remove recurssion from ConstrainedReschedule pass

The ConstrainedReschedule pass previosuly was using a recursive depth
first traversal to push back overlapping gates after aligning
operations. This however would cause a failure for a sufficiently large
circuit when the recursion depth could potentially exceed the maximum
stack depth allowed in python. To address this, this commit rewrites the
depth first traversal to be iterative instead of recursive. This removes
the stack depth limitation and should let the pass run with any size
circuit.

However, the performance of this pass is poor for large circuits. One
thing we can look at using to try and speed it up is rustworkx's
dfs_search() function which will let us shift the traversal to rust and
call back to python to do the timing offsets. If this is insufficient
we'll have to investigate a different algorithm for adjusting the time
that doesn't require multiple iterations like the current approach.

Fixes #10049

* Use rustworkx's dfs_search instead of manual dfs implementation

This commit rewrites the pass to leverage rustworkx's dfs_search
function which provides a way to have rustworkx traverse the graph in a
depth first manner and then provides hook points to execute code at
different named portions of the DFS. By leveraging this function we're
able to speed up the search by leveraging rust to perform the actual
graph traversal.

* Revert "Use rustworkx's dfs_search instead of manual dfs implementation"

This made performance of the pass worse so reverting this for now. We
can investigate this at a later date.

This reverts commit bd3cbb2.

* Remove visited node check from DFS

This commit removes the visited node check and skip logic from the DFS
traversal. To ensure this code behaves identically to the recursive
version before this PR this logic is removed because there wasn't a
similar check in that version.

(cherry picked from commit 112bd6e)

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
@mtreinish mtreinish deleted the no-recursive-reschedul branch May 3, 2023 14:08
@alexanderivrii
Copy link
Contributor

@mtreinish, Ouch! I was completely convinced that this PR keeps the behavior of ConstraintReschedule exactly as is, but I have found examples where it's not so.

Here is one such example:

def experiment1():
    durations = InstructionDurations(
        [
            ("cx", (0, 1), 100),
        ])

    pm = PassManager(
        [
            ASAPScheduleAnalysis(durations),
            ConstrainedReschedule(pulse_alignment=100),
            PadDelay(),
        ]
    )

    qc = QuantumCircuit(2)
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)
    qc.delay(10, 0, "dt")
    qc.cx(0, 1)

    qct = pm.run(qc)
    print(qct)

The difference comes from when the node_start_time gets updated. Suppose that we have a sequence of nodes A -> ... -> B -> C, with multiple paths from A to B. In the original recursive algorithm, we reach B from A along a path, recursively process the stuff downstream of B, update the node_start_time of B, return to A, reach B from a different path, do calculations with B's already updated start times, possibly update the node_start_time of B again. In the new non-recursive version of the algorithm, we still examine the same paths, but both computations at B are with respect to the original start times, so the finally computed cumulative shift amounts at B and the new start time differ.

As far as I can judge, there is a simple fix, since both for the original code and the new code we can update the node_start_time of a node before (and not after) the recursion from this node (e.g. nothing in the recursion downstream of B can modify B's starting time).

That is, we can move the lines

            # Update start time of this node after all overlaps are resolved
            node_start_time[node] += shift

to just before of

            # Check successors for overlap
            for next_node in self._get_next_gate(dag, node):
                # Compute next node start time separately for qreg and creg

and completely remove the shift_stack. This seems to fix the examples I am experimenting with. @mtreinish, could you please take a look if this makes sense?

@mtreinish
Copy link
Member Author

I think this makes sense,when I wrote this I think I was too fixated on maintaining the exact run time behavior and since the update happened after recursion in the recursive version I added the shift stack loop. But, I think you're correct there isn't anything in the children nodes that can impact the start time of a parent so we should shift before looping over the successors. Can you push a quick PR to fix this and we can try to get it in before 0.24 goes out today.

king-p3nguin pushed a commit to king-p3nguin/qiskit-terra that referenced this pull request May 22, 2023
* Remove recurssion from ConstrainedReschedule pass

The ConstrainedReschedule pass previosuly was using a recursive depth
first traversal to push back overlapping gates after aligning
operations. This however would cause a failure for a sufficiently large
circuit when the recursion depth could potentially exceed the maximum
stack depth allowed in python. To address this, this commit rewrites the
depth first traversal to be iterative instead of recursive. This removes
the stack depth limitation and should let the pass run with any size
circuit.

However, the performance of this pass is poor for large circuits. One
thing we can look at using to try and speed it up is rustworkx's
dfs_search() function which will let us shift the traversal to rust and
call back to python to do the timing offsets. If this is insufficient
we'll have to investigate a different algorithm for adjusting the time
that doesn't require multiple iterations like the current approach.

Fixes Qiskit#10049

* Use rustworkx's dfs_search instead of manual dfs implementation

This commit rewrites the pass to leverage rustworkx's dfs_search
function which provides a way to have rustworkx traverse the graph in a
depth first manner and then provides hook points to execute code at
different named portions of the DFS. By leveraging this function we're
able to speed up the search by leveraging rust to perform the actual
graph traversal.

* Revert "Use rustworkx's dfs_search instead of manual dfs implementation"

This made performance of the pass worse so reverting this for now. We
can investigate this at a later date.

This reverts commit bd3cbb2.

* Remove visited node check from DFS

This commit removes the visited node check and skip logic from the DFS
traversal. To ensure this code behaves identically to the recursive
version before this PR this logic is removed because there wasn't a
similar check in that version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: Bugfix Include in the "Fixed" section of the changelog priority: high stable backport potential The bug might be minimal and/or import enough to be port to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConstrainedReschedule Pass causes RecurssionError for large enough input circuit
6 participants