Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rust-based OpenQASM 2 converter #9784

Merged
merged 28 commits into from
Apr 12, 2023

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Mar 13, 2023

Summary

This is a vendored version of qiskit-qasm2, with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package.

This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construct QuantumCircuits that are not correct) and more extensible.

When using its full set of compatibility options, this can be a drop-in replacement for QuantumCircuit.from_qasm_str and QuantumCircuit.from_qasm_file, which will allow us to deprecate and remove all the code in qiskit/qasm (except for the Pygments stuff which we can replace with openqasm-pygments, which I'd even forgotten I wrote ages ago). I haven't done that yet, though - in an approximate attempt to keep the diff down (+8,200, -11 lol), I can begin those deprecations in a follow-up. We can also choose to replace the internals of QuantumCircuit.from_qasm_str and _file immediately, if we prefer, to realise the speed-ups for everyone (again, follow-up).

Details and comments

The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions.

Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example:

  • the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal.

  • the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary.

  • there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor.

I'll probably keep playing with some of those in the qiskit-qasm2 package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one:

In [1]: import qiskit.qasm2
   ...: prog = """
   ...:     OPENQASM 2.0;
   ...:     include "qelib1.inc";
   ...:     qreg q[2];
   ...: """
   ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000
   ...: %timeit qiskit.qasm2.loads(prog)
   ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog)
2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

cx-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of CXGate is higher than most gates, and this dominates the execution time for the Rust-based parser.

This is a vendored version of qiskit-qasm2
(https://pypi.org/project/qiskit-qasm2), with this initial commit being
equivalent (barring some naming / documentation / testing conversions to
match Qiskit's style) to version 0.5.3 of that package.

This adds a new translation layer from OpenQASM 2 to Qiskit, which is
around an order of magnitude faster than the existing version in Python,
while being more type safe (in terms of disallowing invalid OpenQASM 2
programs rather than attempting to construction `QuantumCircuit`s that
are not correct) and more extensible.

The core logic is a hand-written lexer and parser combination written in
Rust, which emits a bytecode stream across the PyO3 boundary to a small
Python interpreter loop.  The main bulk of the parsing logic is a simple
LL(1) recursive-descent algorithm, which delegates to more specific
recursive Pratt-based algorithm for handling classical expressions.

Many of the design decisions made (including why the lexer is written by
hand) are because the project originally started life as a way for me to
learn about implementations of the different parts of a parser stack;
this is the principal reason there are very few external crates used.
There are a few inefficiencies in this implementation, for example:

- the string interner in the lexer allocates twice for each stored
  string (but zero times for a lookup).  It may be possible to
  completely eliminate allocations when parsing a string (or a file if
  it's read into memory as a whole), but realistically there's only a
  fairly small number of different tokens seen in most OpenQASM 2
  programs, so it shouldn't be too big a deal.

- the hand-off from Rust to Python transfers small objects frequently.
  It might be more efficient to have a secondary buffered iterator in
  Python space, transferring more bytecode instructions at a time and
  letting Python resolve them.  This form could also be made
  asynchronous, since for the most part, the Rust components only need
  to acquire the CPython GIL at the API boundary.

- there are too many points within the lexer that can return a failure
  result that needs unwrapping at every site.  Since there are no tokens
  that can span multiple lines, it should be possible to refactor so
  that almost all of the byte-getter and -peeker routines cannot return
  error statuses, at the cost of the main lexer loop becoming
  responsible for advancing the line buffer, and moving the non-ASCII
  error handling into each token constructor.

I'll probably keep playing with some of those in the `qiskit-qasm2`
package itself when I have free time, but at some point I needed to draw
the line and vendor the package.  It's still ~10x faster than the
existing one:

    In [1]: import qiskit.qasm2
       ...: prog = """
       ...:     OPENQASM 2.0;
       ...:     include "qelib1.inc";
       ...:     qreg q[2];
       ...: """
       ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000
       ...: %timeit qiskit.qasm2.loads(prog)
       ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog)
    2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

`cx`-heavy programs like this one are actually the ones that the new
parser is (comparatively) slowest on, because the construction time of
`CXGate` is higher than most gates, and this dominates the execution
time for the Rust-based parser.
@jakelishman jakelishman added Changelog: New Feature Include in the "Added" section of the changelog mod: qasm2 Relating to OpenQASM 2 import or export Rust This PR or issue is related to Rust code in the repository labels Mar 13, 2023
@jakelishman jakelishman added this to the 0.24.0 milestone Mar 13, 2023
@jakelishman jakelishman requested a review from a team as a code owner March 13, 2023 11:38
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@coveralls
Copy link

coveralls commented Mar 13, 2023

Pull Request Test Coverage Report for Build 4679866338

  • 2432 of 2555 (95.19%) changed or added relevant lines in 9 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 85.727%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/qasm2/init.py 31 34 91.18%
qiskit/qasm2/parse.py 173 178 97.19%
crates/qasm2/src/bytecode.rs 125 131 95.42%
crates/qasm2/src/expr.rs 391 417 93.76%
crates/qasm2/src/lex.rs 355 395 89.87%
crates/qasm2/src/parse.rs 1239 1282 96.65%
Totals Coverage Status
Change from base Build 4678110396: 0.3%
Covered Lines: 70271
Relevant Lines: 81971

💛 - Coveralls

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting speedup!

Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used.

What are your thoughts on migrating to using one of the excellent Rust parsing crates? https://lib.rs/parsing. That will result in much less code for us to maintain, along with benefitting from the often-years of improvements those teams have made to their implementation. (Ack it will take some work to migrate to)

--

Also, I think this may benefit from some unit tests written in Rust. Those will be executed in CI thanks to #9593.

You do already have solid tests in Python, so this question isn't blocking. But, for example, unit tests of the parser itself (if you don't switch to a library) will help make sure we're handling all edge cases correctly. And unit tests also give more clear feedback of what specifically broke than having to work backwards from a Python test failure.

What do you think?

@jakelishman
Copy link
Member Author

jakelishman commented Mar 13, 2023

Honestly, I'm not particularly interested in putting in any further work to migrate this to a parsing library, because a) I spent long enough in work time playing with this anyway and b) parsing isn't the bottleneck - that's Python-space construction. OpenQASM 2 is an old language that doesn't need any further features supporting (the current Python-space implementation of the parser has effectively not changed in 5 years).

I'm not sure I see the benefit of adding any unit tests actually written in Rust. The Python tests are unit tests of the public API; there's no part of the Rust code that's public, and they already hit (as best as I could think of them) all the edge cases of parsing because the inputs are just OpenQASM 2 programs. I think adding tests that are specifically of private functions in order to get more accurate diagnostics is probably a false economy: I personally had no trouble fixing bugs in the implementation when I was writing the package using these tests, whereas having tests of non-public-API functionality means that the tests have to change when the implementation changes. For example, it should be possible to migrate to a parsing library without changing a single one of my tests of the public API, but private tests of the Rust code would have to be entirely rewritten.

@jakelishman
Copy link
Member Author

Oh, and more about lexing/parsing libraries: now that I've written one, I've no need to write a lexer by hand again - lexer generators are much better - but I'd still write the parser by hand. You can have much better error messages and error recovery with a handwritten parser - it's why the majority of actual language compilers (as opposed to simple DSL parsers, where a generator library is more appropriate) go that route.

You realistically need to use a parser generator if you're going to do LALR(1) parsing because it'd be mad tedious to build the stack and tables by hand and you'd be very liable to get it wrong, but LL(1) rescursive-descent-based parsing like this is very straightforwards to write by hand, is easier to emit good error messages from, and really, I think lexing is more likely to be a performance problem, especially for such a simple language as OpenQASM 2.

The version of Sphinx that we're constrained to use in the docs build
can't handle the `Unpack` operator, so as a temporary measure we can
just relax the type hint a little.
@jakelishman
Copy link
Member Author

jakelishman commented Mar 14, 2023

The docs failure appears to be something specific to Unpack in Python 3.9 and Sphinx 5 - it's been fixed at some point between then and Sphinx 6.1.3, which we artificially can't use because of an unupdated constraint applied by sphinx-design. I worked around it in 5723864 by just relaxing the type hint a little.

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent quality code!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: this file looks good.

crates/qasm2/src/bytecode.rs Outdated Show resolved Hide resolved
crates/qasm2/src/bytecode.rs Outdated Show resolved Hide resolved
crates/qasm2/src/bytecode.rs Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: this file looks good, outside of some small suggestions. I closely reviewed it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: this file looks good. I closely reviewed it.

crates/qasm2/src/parse.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: I did not closely review this, only the PyO3 things which look good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at this file and it looks good. I only had a few clarification questions.

There is of course a very extensive error handling and there is an extensive set of tests in test/python/qasm2 , I have not made completely sure that every allowed behavior or an error is covered by some test.


// We define the exception in Python space so it can inherit from QiskitError; it's easier to do
// that from Python and wrap rather than also needing to import QiskitError to Rust to wrap.
import_exception!(qiskit.qasm2.exceptions, QASM2ParseError);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You define 3 exceptions in Python land but only import 1 here

Copy link
Member Author

@jakelishman jakelishman Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's expected - the Rust code only needs to emit the concrete QASM2ParseError, never QASM2ExportError. QASM2Error is more structural than a concrete class, but it's in place as the base class for a follow-up PR where I'll alias the existing QasmError to QASM2Error as part of the complete removal of the old qiskit.qasm namespace. I actually only meant to define two exceptions in Python space in this PR - the QASM2ExportError one doesn't need to be in this PR, I just missed it when rebasing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it again, the comment is kind of an artifact of how I vendored the package - over on https://github.com/jakelishman/qiskit-qasm2, I just have a single exception defined in Rust space, so when I wrote the comment after fixing things up into Qiskit structure, I had that historical background in mind.

qiskit/qasm2/parse.py Outdated Show resolved Hide resolved
This should have no impact on runtime or on memory usage, since each of
the new types has the same bit width and alignment as the `usize` values
they replace.
Eric-Arellano
Eric-Arellano previously approved these changes Mar 18, 2023
Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PyO3/Rusty parts of this PR. Didn't closely look at Python unit tests or Rust parser/lexer

/// Define a simple newtype that just has a single non-public `usize` field, has a `new`
/// constructor, and implements `Copy` and `IntoPy`. The first argument is the name of the type,
/// the second is whether to also define addition to make offsetting the newtype easier.
macro_rules! newtype_id {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: this looks good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial review, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial review, done.

jakelishman and others added 4 commits April 3, 2023 17:52
The previous system was quite confusing, and required all accesses to
the global symbol table to know that the `Gate` symbol could be present
but overridable.  This led to confusing logic, various bugs and
unnecessary constraints, such as it previously being (erroneously)
possible to provide re-definitions for any "built-in" gate.

Instead, we keep a separate store of instructions that may be redefined.
This allows the logic to be centralised to only to the place responsible
for performing those overrides, and remains accessible for error-message
builders to query in order to provide better diagnostics.
Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>
mtreinish
mtreinish previously approved these changes Apr 11, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick scan through the code to see if anything stood out to me, I'm relying mostly on the detailed review from all the other code-reviewers. I left a couple of inline comments, nothing major the only thing I think would be good to fix is the pyo3 version.

}
}

/// Apply a "scientific calculator" built-in function to an [expression][Expr]. If the operand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was always amused by this phrasing in the openqasm2 paper, especially since the scientific calculator I got when I went to college was programmable and had numerical integration support.

crates/qasm2/Cargo.toml Outdated Show resolved Hide resolved
crates/qasm2/src/parse.rs Outdated Show resolved Hide resolved
jakelishman and others added 4 commits April 12, 2023 15:50
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
For a hashset of only 6 elements that is only checked once, there's not
really any point to pull in an extra dependency or use a hash set at
all.
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the update

@mtreinish mtreinish added this pull request to the merge queue Apr 12, 2023
Merged via the queue into Qiskit:main with commit 73602b0 Apr 12, 2023
@jakelishman jakelishman deleted the qasm2/rust-parser branch April 12, 2023 16:57
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Apr 12, 2023
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions
for both import and export of OpenQASM 2 programs, this commit now
unifies the exception handling through the rest of the library to use
these exceptions.

To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now
a re-export of `qiskit.qasm2.QASM2Error`.  This is the base error of
both the exporter and importers, despite the old documentation of
`QasmError` claiming to be specifically for parser errors.  In practice,
the exporter also emitted `QasmError`, so having that be `QASM2Error`
allows people catching that error to get the same behaviour in these new
forms.  The actual exporter and importer are updated to emit the precise
subclasses.
giacomoRanieri pushed a commit to giacomoRanieri/qiskit-terra that referenced this pull request Apr 16, 2023
* Add Rust-based OpenQASM 2 converter

This is a vendored version of qiskit-qasm2
(https://pypi.org/project/qiskit-qasm2), with this initial commit being
equivalent (barring some naming / documentation / testing conversions to
match Qiskit's style) to version 0.5.3 of that package.

This adds a new translation layer from OpenQASM 2 to Qiskit, which is
around an order of magnitude faster than the existing version in Python,
while being more type safe (in terms of disallowing invalid OpenQASM 2
programs rather than attempting to construction `QuantumCircuit`s that
are not correct) and more extensible.

The core logic is a hand-written lexer and parser combination written in
Rust, which emits a bytecode stream across the PyO3 boundary to a small
Python interpreter loop.  The main bulk of the parsing logic is a simple
LL(1) recursive-descent algorithm, which delegates to more specific
recursive Pratt-based algorithm for handling classical expressions.

Many of the design decisions made (including why the lexer is written by
hand) are because the project originally started life as a way for me to
learn about implementations of the different parts of a parser stack;
this is the principal reason there are very few external crates used.
There are a few inefficiencies in this implementation, for example:

- the string interner in the lexer allocates twice for each stored
  string (but zero times for a lookup).  It may be possible to
  completely eliminate allocations when parsing a string (or a file if
  it's read into memory as a whole), but realistically there's only a
  fairly small number of different tokens seen in most OpenQASM 2
  programs, so it shouldn't be too big a deal.

- the hand-off from Rust to Python transfers small objects frequently.
  It might be more efficient to have a secondary buffered iterator in
  Python space, transferring more bytecode instructions at a time and
  letting Python resolve them.  This form could also be made
  asynchronous, since for the most part, the Rust components only need
  to acquire the CPython GIL at the API boundary.

- there are too many points within the lexer that can return a failure
  result that needs unwrapping at every site.  Since there are no tokens
  that can span multiple lines, it should be possible to refactor so
  that almost all of the byte-getter and -peeker routines cannot return
  error statuses, at the cost of the main lexer loop becoming
  responsible for advancing the line buffer, and moving the non-ASCII
  error handling into each token constructor.

I'll probably keep playing with some of those in the `qiskit-qasm2`
package itself when I have free time, but at some point I needed to draw
the line and vendor the package.  It's still ~10x faster than the
existing one:

    In [1]: import qiskit.qasm2
       ...: prog = """
       ...:     OPENQASM 2.0;
       ...:     include "qelib1.inc";
       ...:     qreg q[2];
       ...: """
       ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000
       ...: %timeit qiskit.qasm2.loads(prog)
       ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog)
    2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

`cx`-heavy programs like this one are actually the ones that the new
parser is (comparatively) slowest on, because the construction time of
`CXGate` is higher than most gates, and this dominates the execution
time for the Rust-based parser.

* Work around docs failure on Sphinx 5.3, Python 3.9

The version of Sphinx that we're constrained to use in the docs build
can't handle the `Unpack` operator, so as a temporary measure we can
just relax the type hint a little.

* Remove unused import

* Tweak documentation

* More specific PyO3 usage

* Use PathBuf directly for paths

* Format

* Freeze dataclass

* Use type-safe id types

This should have no impact on runtime or on memory usage, since each of
the new types has the same bit width and alignment as the `usize` values
they replace.

* Documentation tweaks

* Fix comments in lexer

* Fix lexing version number with separating comments

* Add test of pathological formatting

* Fixup release note

* Fix handling of u0 gate

* Credit reviewers

Co-authored-by: Luciano Bello <bel@zurich.ibm.com>
Co-authored-by: Kevin Hartman <kevin@hart.mn>
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>

* Add test of invalid gate-body statements

* Refactor custom built-in gate definitions

The previous system was quite confusing, and required all accesses to
the global symbol table to know that the `Gate` symbol could be present
but overridable.  This led to confusing logic, various bugs and
unnecessary constraints, such as it previously being (erroneously)
possible to provide re-definitions for any "built-in" gate.

Instead, we keep a separate store of instructions that may be redefined.
This allows the logic to be centralised to only to the place responsible
for performing those overrides, and remains accessible for error-message
builders to query in order to provide better diagnostics.

* Credit Sasha

Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>

* Credit Matthew

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

* Remove dependency on `lazy_static`

For a hashset of only 6 elements that is only checked once, there's not
really any point to pull in an extra dependency or use a hash set at
all.

* Update PyO3 version

---------

Co-authored-by: Luciano Bello <bel@zurich.ibm.com>
Co-authored-by: Kevin Hartman <kevin@hart.mn>
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
king-p3nguin pushed a commit to king-p3nguin/qiskit-terra that referenced this pull request May 22, 2023
* Add Rust-based OpenQASM 2 converter

This is a vendored version of qiskit-qasm2
(https://pypi.org/project/qiskit-qasm2), with this initial commit being
equivalent (barring some naming / documentation / testing conversions to
match Qiskit's style) to version 0.5.3 of that package.

This adds a new translation layer from OpenQASM 2 to Qiskit, which is
around an order of magnitude faster than the existing version in Python,
while being more type safe (in terms of disallowing invalid OpenQASM 2
programs rather than attempting to construction `QuantumCircuit`s that
are not correct) and more extensible.

The core logic is a hand-written lexer and parser combination written in
Rust, which emits a bytecode stream across the PyO3 boundary to a small
Python interpreter loop.  The main bulk of the parsing logic is a simple
LL(1) recursive-descent algorithm, which delegates to more specific
recursive Pratt-based algorithm for handling classical expressions.

Many of the design decisions made (including why the lexer is written by
hand) are because the project originally started life as a way for me to
learn about implementations of the different parts of a parser stack;
this is the principal reason there are very few external crates used.
There are a few inefficiencies in this implementation, for example:

- the string interner in the lexer allocates twice for each stored
  string (but zero times for a lookup).  It may be possible to
  completely eliminate allocations when parsing a string (or a file if
  it's read into memory as a whole), but realistically there's only a
  fairly small number of different tokens seen in most OpenQASM 2
  programs, so it shouldn't be too big a deal.

- the hand-off from Rust to Python transfers small objects frequently.
  It might be more efficient to have a secondary buffered iterator in
  Python space, transferring more bytecode instructions at a time and
  letting Python resolve them.  This form could also be made
  asynchronous, since for the most part, the Rust components only need
  to acquire the CPython GIL at the API boundary.

- there are too many points within the lexer that can return a failure
  result that needs unwrapping at every site.  Since there are no tokens
  that can span multiple lines, it should be possible to refactor so
  that almost all of the byte-getter and -peeker routines cannot return
  error statuses, at the cost of the main lexer loop becoming
  responsible for advancing the line buffer, and moving the non-ASCII
  error handling into each token constructor.

I'll probably keep playing with some of those in the `qiskit-qasm2`
package itself when I have free time, but at some point I needed to draw
the line and vendor the package.  It's still ~10x faster than the
existing one:

    In [1]: import qiskit.qasm2
       ...: prog = """
       ...:     OPENQASM 2.0;
       ...:     include "qelib1.inc";
       ...:     qreg q[2];
       ...: """
       ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000
       ...: %timeit qiskit.qasm2.loads(prog)
       ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog)
    2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

`cx`-heavy programs like this one are actually the ones that the new
parser is (comparatively) slowest on, because the construction time of
`CXGate` is higher than most gates, and this dominates the execution
time for the Rust-based parser.

* Work around docs failure on Sphinx 5.3, Python 3.9

The version of Sphinx that we're constrained to use in the docs build
can't handle the `Unpack` operator, so as a temporary measure we can
just relax the type hint a little.

* Remove unused import

* Tweak documentation

* More specific PyO3 usage

* Use PathBuf directly for paths

* Format

* Freeze dataclass

* Use type-safe id types

This should have no impact on runtime or on memory usage, since each of
the new types has the same bit width and alignment as the `usize` values
they replace.

* Documentation tweaks

* Fix comments in lexer

* Fix lexing version number with separating comments

* Add test of pathological formatting

* Fixup release note

* Fix handling of u0 gate

* Credit reviewers

Co-authored-by: Luciano Bello <bel@zurich.ibm.com>
Co-authored-by: Kevin Hartman <kevin@hart.mn>
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>

* Add test of invalid gate-body statements

* Refactor custom built-in gate definitions

The previous system was quite confusing, and required all accesses to
the global symbol table to know that the `Gate` symbol could be present
but overridable.  This led to confusing logic, various bugs and
unnecessary constraints, such as it previously being (erroneously)
possible to provide re-definitions for any "built-in" gate.

Instead, we keep a separate store of instructions that may be redefined.
This allows the logic to be centralised to only to the place responsible
for performing those overrides, and remains accessible for error-message
builders to query in order to provide better diagnostics.

* Credit Sasha

Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>

* Credit Matthew

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

* Remove dependency on `lazy_static`

For a hashset of only 6 elements that is only checked once, there's not
really any point to pull in an extra dependency or use a hash set at
all.

* Update PyO3 version

---------

Co-authored-by: Luciano Bello <bel@zurich.ibm.com>
Co-authored-by: Kevin Hartman <kevin@hart.mn>
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Jul 13, 2023
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions
for both import and export of OpenQASM 2 programs, this commit now
unifies the exception handling through the rest of the library to use
these exceptions.

To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now
a re-export of `qiskit.qasm2.QASM2Error`.  This is the base error of
both the exporter and importers, despite the old documentation of
`QasmError` claiming to be specifically for parser errors.  In practice,
the exporter also emitted `QasmError`, so having that be `QASM2Error`
allows people catching that error to get the same behaviour in these new
forms.  The actual exporter and importer are updated to emit the precise
subclasses.
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Jul 13, 2023
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions
for both import and export of OpenQASM 2 programs, this commit now
unifies the exception handling through the rest of the library to use
these exceptions.

To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now
a re-export of `qiskit.qasm2.QASM2Error`.  This is the base error of
both the exporter and importers, despite the old documentation of
`QasmError` claiming to be specifically for parser errors.  In practice,
the exporter also emitted `QasmError`, so having that be `QASM2Error`
allows people catching that error to get the same behaviour in these new
forms.  The actual exporter and importer are updated to emit the precise
subclasses.
github-merge-queue bot pushed a commit that referenced this pull request Jul 14, 2023
* Unify OpenQASM 2 exceptions

Since gh-9784 added a new `qiskit.qasm2` package and suitable exceptions
for both import and export of OpenQASM 2 programs, this commit now
unifies the exception handling through the rest of the library to use
these exceptions.

To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now
a re-export of `qiskit.qasm2.QASM2Error`.  This is the base error of
both the exporter and importers, despite the old documentation of
`QasmError` claiming to be specifically for parser errors.  In practice,
the exporter also emitted `QasmError`, so having that be `QASM2Error`
allows people catching that error to get the same behaviour in these new
forms.  The actual exporter and importer are updated to emit the precise
subclasses.

* Add newly-imported gates to exclude

The slight change in import order means that two new `Gate` class
bodies got executed during the imports for this particular part of the
test suite.  These gates were internal to the `qasm2.load`
functionality and should be exempt from the automated
definition-equivalence testing.
@1ucian0 1ucian0 mentioned this pull request Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: qasm2 Relating to OpenQASM 2 import or export Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants