Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable avoiding Python operation creation in transpiler #12692

Merged
merged 4 commits into from
Jul 2, 2024

Conversation

mtreinish
Copy link
Member

Summary

Since #12459 accessing node.op in the transpiler eagerly creates a Python object on access. This is because we now are no longer storing a Python object internally and we need to rebuild the object to return the python object as expected by the api. This is causing a significant performance regression because of the extra overhead. The longer term goal is to move as much of the performance critical passes to operate in rust which will eliminate this overhead. But in the meantime we can mitigate the performance overhead by changing the Python access patterns to avoid the operation object creation. This commit adds some new getter methods to DAGOpNode to give access to the inner rust data so that we can avoid the extra overhead. As a proof of concept this updates the unitary synthesis pass in isolation. Doing this fixes the regression caused by #12459 for that pass. We can continue this migration for everything else in follow up PRs. This commit is mostly to establish the pattern and add the python space access methods.

Details and comments

Since Qiskit#12459 accessing `node.op` in the transpiler eagerly creates a
Python object on access. This is because we now are no longer storing a
Python object internally and we need to rebuild the object to return the
python object as expected by the api. This is causing a significant
performance regression because of the extra overhead. The longer term
goal is to move as much of the performance critical passes to operate in
rust which will eliminate this overhead. But in the meantime we can
mitigate the performance overhead by changing the Python access patterns
to avoid the operation object creation. This commit adds some new getter
methods to DAGOpNode to give access to the inner rust data so that we
can avoid the extra overhead. As a proof of concept this updates the
unitary synthesis pass in isolation. Doing this fixes the regression
caused by Qiskit#12459 for that pass. We can continue this migration for
everything else in follow up PRs. This commit is mostly to establish the
pattern and add the python space access methods.
@mtreinish mtreinish added performance Changelog: None Do not include in changelog labels Jun 28, 2024
@mtreinish mtreinish added this to the 1.2.0 milestone Jun 28, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@mtreinish mtreinish changed the title Avoid Python operation creation in transpiler Enable avoiding Python operation creation in transpiler Jun 28, 2024
@coveralls
Copy link

coveralls commented Jun 28, 2024

Pull Request Test Coverage Report for Build 9718970161

Details

  • 52 of 134 (38.81%) changed or added relevant lines in 6 files are covered.
  • 8 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.08%) to 89.694%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/operations.rs 5 7 71.43%
crates/circuit/src/circuit_instruction.rs 0 33 0.0%
crates/circuit/src/dag_node.rs 22 69 31.88%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.35%
crates/qasm2/src/lex.rs 6 92.37%
Totals Coverage Status
Change from base Build 9718390567: -0.08%
Covered Lines: 64110
Relevant Lines: 71476

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 29, 2024

Pull Request Test Coverage Report for Build 9719830598

Details

  • 53 of 135 (39.26%) changed or added relevant lines in 6 files are covered.
  • 16 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.09%) to 89.687%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/operations.rs 5 7 71.43%
crates/circuit/src/circuit_instruction.rs 0 33 0.0%
crates/circuit/src/dag_node.rs 22 69 31.88%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.35%
crates/qasm2/src/parse.rs 6 97.61%
crates/qasm2/src/lex.rs 8 92.37%
Totals Coverage Status
Change from base Build 9718390567: -0.09%
Covered Lines: 64105
Relevant Lines: 71476

💛 - Coveralls

@mtreinish mtreinish added Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Jun 29, 2024
@coveralls
Copy link

coveralls commented Jun 29, 2024

Pull Request Test Coverage Report for Build 9725168342

Details

  • 127 of 181 (70.17%) changed or added relevant lines in 9 files are covered.
  • 28 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.07%) to 89.707%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/circuit_instruction.rs 26 33 78.79%
crates/circuit/src/dag_node.rs 46 69 66.67%
crates/circuit/src/operations.rs 7 31 22.58%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.35%
crates/qasm2/src/lex.rs 8 91.35%
crates/qasm2/src/parse.rs 18 96.69%
Totals Coverage Status
Change from base Build 9718390567: -0.07%
Covered Lines: 64147
Relevant Lines: 71507

💛 - Coveralls

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jun 29, 2024
This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.

As part of this the commutation checker is rewritten in rust since all
that requires is gates in rust which we've had a representation of
since Qiskit#12459 merged.
Copy link
Contributor

@ElePT ElePT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's nice to have an intermediate solution until the passes are migrated to rust, but I wonder how do we want to approach the follow-ups. Do we want to try migrating as many python passes to not access node.op first, independently of when we plan to have a rust implementation for them? Or does it make sense to wait and see which passes we don't manage to migrate to rust in the near term and only update those?

Comment on lines +124 to +127
py, op, None, None, params, label, duration, unit, condition,
)?;
instruction.qubits = qargs.into();
instruction.clbits = cargs.into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something here, but why aren't the qubits and clbits directly given to the CircuitInstruction constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because of a type mismatch. The constructor for a circuit instruction takes in an impl IntoIterator while we have a PyTuple here which isn't doesn't implement the IntoItertator trait. I thought about collecting the tuples into a Vec so we could pass them to new(), but at the end of the day the CircuitInstruction has a PyTuple in it, so I just elected to do this to avoid needing any conversions or tweaks to the interfaces.

@mtreinish
Copy link
Member Author

I think that it's nice to have an intermediate solution until the passes are migrated to rust, but I wonder how do we want to approach the follow-ups. Do we want to try migrating as many python passes to not access node.op first, independently of when we plan to have a rust implementation for them? Or does it make sense to wait and see which passes we don't manage to migrate to rust in the near term and only update those?

Yeah I was thinking of trying to do this first. At this point I'm a bit skeptical we'll get any passes ported before the 1.2 PR proposal freeze deadline in ~2 weeks given the current state of the dagcircuit PR. So I pushed this up to give us a way to mitigate the regression caused by #12459 in the short term for 1.2. But there are probably some passes that we can work with just DAGOpNode/CircuitInstruction from rust like what I did in #12650 and we should try to do that instead of updating the Python code in those cases when we encounter them.

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 1, 2024
This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 1, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 1, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
Copy link
Contributor

@ElePT ElePT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Approving!

@ElePT ElePT added this pull request to the merge queue Jul 2, 2024
Merged via the queue into Qiskit:main with commit d2ab4df Jul 2, 2024
15 checks passed
@mtreinish mtreinish deleted the python-access-methods-to-rust-data branch July 2, 2024 13:09
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 2, 2024
This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 2, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2024
* Avoid Python op creation in commutative cancellation

This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of #12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of #12459.

* Remove stray print

* Don't add __array__ to DAGOpNode or CircuitInstruction
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 3, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 3, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
jlapeyre added a commit to jlapeyre/qiskit-core that referenced this pull request Jul 8, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
jlapeyre added a commit to jlapeyre/qiskit-core that referenced this pull request Jul 8, 2024
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
github-merge-queue bot pushed a commit that referenced this pull request Jul 8, 2024
* Use rust gates for ConsolidateBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in #12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after #12459. To that end this builds on the thread of work in
the two PRs #12692 and #12701 which changed the access patterns for
other passes to minimize eager gate object construction.

* Add rust filter function for DAGCircuit.collect_2q_runs()

* Update crates/accelerate/src/convert_2q_block_matrix.rs

---------

Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 10, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of #12692 and #12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of #12459 on the BasisTranslator.
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Avoid Python operation creation in transpiler

Since Qiskit#12459 accessing `node.op` in the transpiler eagerly creates a
Python object on access. This is because we now are no longer storing a
Python object internally and we need to rebuild the object to return the
python object as expected by the api. This is causing a significant
performance regression because of the extra overhead. The longer term
goal is to move as much of the performance critical passes to operate in
rust which will eliminate this overhead. But in the meantime we can
mitigate the performance overhead by changing the Python access patterns
to avoid the operation object creation. This commit adds some new getter
methods to DAGOpNode to give access to the inner rust data so that we
can avoid the extra overhead. As a proof of concept this updates the
unitary synthesis pass in isolation. Doing this fixes the regression
caused by Qiskit#12459 for that pass. We can continue this migration for
everything else in follow up PRs. This commit is mostly to establish the
pattern and add the python space access methods.

* Remove unused import

* Add path to avoid StandardGate conversion in circuit_to_dag

* Add fast path through dag_to_circuit
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Avoid Python op creation in commutative cancellation

This commit updates the commutative cancellation and commutation
analysis transpiler pass. It builds off of Qiskit#12692 to adjust access
patterns in the python transpiler path to avoid eagerly creating a
Python space operation object. The goal of this PR is to mitigate the
performance regression on these passes introduced by the extra
conversion cost of Qiskit#12459.

* Remove stray print

* Don't add __array__ to DAGOpNode or CircuitInstruction
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Use rust gates for ConsolidateBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.

* Add rust filter function for DAGCircuit.collect_2q_runs()

* Update crates/accelerate/src/convert_2q_block_matrix.rs

---------

Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
This commit updates the BasisTranslator transpiler pass. It builds off
of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler
path to avoid eagerly creating a Python space operation object. The goal
of this PR is to mitigate the performance regression introduced by the
extra conversion cost of Qiskit#12459 on the BasisTranslator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants