From 5a72c02e6c9a6aacc95a2cb6bfa641874e34a18c Mon Sep 17 00:00:00 2001 From: Rafey Iqbal Rahman <59226057+RafeyIqbalRahman@users.noreply.github.com> Date: Fri, 5 Mar 2021 22:48:16 +0500 Subject: [PATCH 1/7] Fix grammar, capitalization, text inconsistencies (#900) Co-authored-by: Christopher J. Wood Co-authored-by: Matthew Treinish --- CONTRIBUTING.md | 169 ++++++++++++++++++++++++------------------------ 1 file changed, 84 insertions(+), 85 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 910cf58085..a3db828f52 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,14 +1,14 @@ # Contributing First read the overall project contributing guidelines. These are all -included in the qiskit documentation: +included in the Qiskit documentation: https://qiskit.org/documentation/contributing_to_qiskit.html ## Contributing to Qiskit Aer -In addition to the general guidelines there are specific details for -contributing to aer, these are documented below. +In addition to the general guidelines, there are specific details for +contributing to Aer. These are documented below. ### Pull request checklist @@ -23,21 +23,21 @@ please ensure that: *docstring* accordingly. 3. If it makes sense for your change that you have added new tests that cover the changes. -4. Ensure that if your change has an end user facing impact (new feature, - deprecation, removal etc) that you have added a reno release note for that +4. Ensure that if your change has an enduser-facing impact (new feature, + deprecation, removal, etc.), you have added a reno release note for that change and that the PR is tagged for the changelog. ### Changelog generation The changelog is automatically generated as part of the release process automation. This works through a combination of the git log and the pull -request. When a release is tagged and pushed to github the release automation +request. When a release is tagged and pushed to GitHub, the release automation bot looks at all commit messages from the git log for the release. It takes the PR numbers from the git log (assuming a squash merge) and checks if that PR had a `Changelog:` label on it. If there is a label it will add the git commit message summary line from the git log for the release to the changelog. -If there are multiple `Changelog:` tags on a PR the git commit message summary +If there are multiple `Changelog:` tags on a PR, the git commit message summary line from the git log will be used for each changelog category tagged. The current categories for each label are as follows: @@ -52,22 +52,22 @@ The current categories for each label are as follows: ### Release Notes -When making any end user facing changes in a contribution we have to make sure +When making any end user-facing changes in a contribution, we have to make sure we document that when we release a new version of qiskit-aer. The expectation -is that if your code contribution has user facing changes that you will write +is that if your code contribution has user-facing changes that you will write the release documentation for these changes. This documentation must explain what was changed, why it was changed, and how users can either use or adapt -to the change. The idea behind release documentation is that when a naive +to the change. The idea behind the release documentation is that when a naive user with limited internal knowledge of the project is upgrading from the previous release to the new one, they should be able to read the release notes, -understand if they need to update their program which uses qiskit, and how they +understand if they need to update their program which uses Qiskit, and how they would go about doing that. It ideally should explain why they need to make this change too, to provide the necessary context. -To make sure we don't forget a release note or if the details of user facing -changes over a release cycle we require that all user facing changes include -documentation at the same time as the code. To accomplish this we use the -[reno](https://docs.openstack.org/reno/latest/) tool which enables a git based +To make sure we don't forget a release note or if the details of user-facing +changes over a release cycle, we require that all user facing changes include +documentation at the same time as the code. To accomplish this, we use the +[reno](https://docs.openstack.org/reno/latest/) tool which enables a git-based workflow for writing and compiling release notes. #### Adding a new release note @@ -77,21 +77,21 @@ installed with:: pip install -U reno -Once you have reno installed you can make a new release note by running in +Once you have reno installed, you can make a new release note by running in your local repository checkout's root:: reno new short-description-string where short-description-string is a brief string (with no spaces) that describes what's in the release note. This will become the prefix for the release note -file. Once that is run it will create a new yaml file in releasenotes/notes. +file. Once that is run, it will create a new yaml file in releasenotes/notes. Then open that yaml file in a text editor and write the release note. The basic structure of a release note is restructured text in yaml lists under category keys. You add individual items under each category and they will be grouped automatically by release when the release notes are compiled. A single file can have as many entries in it as needed, but to avoid potential conflicts -you'll want to create a new file for each pull request that has user facing -changes. When you open the newly created file it will be a full template of +you'll want to create a new file for each pull request that has user-facing +changes. When you open the newly created file, it will be a full template of the different categories with a description of a category as a single entry in each category. You'll want to delete all the sections you aren't using and update the contents for those you are. For example, the end result should @@ -132,19 +132,19 @@ deprecations: You can also look at other release notes for other examples. You can use any restructured text feature in them (code sections, tables, -enumerated lists, bulleted list, etc) to express what is being changed as -needed. In general you want the release notes to include as much detail as +enumerated lists, bulleted list, etc.) to express what is being changed as +needed. In general, you want the release notes to include as much detail as needed so that users will understand what has changed, why it changed, and how they'll have to update their code. -After you've finished writing your release notes you'll want to add the note +After you've finished writing your release notes, you'll want to add the note file to your commit with `git add` and commit them to your PR branch to make sure they're included with the code in your PR. ##### Linking to issues -If you need to link to an issue or other github artifact as part of the release -note this should be done using an inline link with the text being the issue +If you need to link to an issue or other GitHub artifact as part of the release +note, this should be done using an inline link with the text being the issue number. For example you would write a release note with a link to issue 12345 as: @@ -158,12 +158,12 @@ fixes: #### Generating the release notes -After release notes have been added if you want to see what the full output of -the release notes. In general the output from reno that we'll get is a rst +After release notes have been added, if you want to see the full output of +the release notes, you'll get the output as an rst (ReStructuredText) file that can be compiled by -[sphinx](https://www.sphinx-doc.org/en/master/). To generate the rst file you -use the ``reno report`` command. If you want to generate the full aer release -notes for all releases (since we started using reno during 0.9) you just run:: +[sphinx](https://www.sphinx-doc.org/en/master/). To generate the rst file, you +use the ``reno report`` command. If you want to generate the full Aer release +notes for all releases (since we started using reno during 0.9), you just run:: reno report @@ -172,7 +172,7 @@ it has been tagged:: reno report --version 0.5.0 -At release time ``reno report`` is used to generate the release notes for the +At release time, ``reno report`` is used to generate the release notes for the release and the output will be submitted as a pull request to the documentation repository's [release notes file]( https://github.com/Qiskit/qiskit/blob/master/docs/release_notes.rst) @@ -180,18 +180,18 @@ https://github.com/Qiskit/qiskit/blob/master/docs/release_notes.rst) #### Building release notes locally Building The release notes are part of the standard qiskit-aer documentation -builds. To check what the rendered html output of the release notes will look -like for the current state of the repo you can run: `tox -edocs` which will +builds. To check what the rendered HTML output of the release notes will look +like for the current state of the repo, you can run: `tox -edocs` which will build all the documentation into `docs/_build/html` and the release notes in particular will be located at `docs/_build/html/release_notes.html` ### Development Cycle The development cycle for qiskit-aer is all handled in the open using -the project boards in Github for project management. We use milestones -in Github to track work for specific releases. The features or other changes -that we want to include in a release will be tagged and discussed in Github. -As we're preparing a new release we'll document what has changed since the +the project boards in GitHub for project management. We use milestones +in GitHub to track work for specific releases. The features or other changes +that we want to include in a release will be tagged and discussed in GitHub. +As we're preparing a new release, we'll document what has changed since the previous version in the release notes. ### Branches @@ -211,7 +211,7 @@ merged to it are bugfixes. ### Release cycle -When it is time to release a new minor version of qiskit-aer we will: +When it is time to release a new minor version of qiskit-aer, we will: 1. Create a new tag with the version number and push it to github 2. Change the `master` version to the next release version. @@ -222,7 +222,7 @@ the following steps: 1. Create a stable branch for the new minor version from the release tag on the `master` branch 2. Build and upload binary wheels to pypi -3. Create a github release page with a generated changelog +3. Create a GitHub release page with a generated changelog 4. Generate a PR on the meta-repository to bump the Aer version and meta-package version. @@ -275,7 +275,7 @@ You're now ready to build from source! Follow the instructions for your platform ### Linux -Qiskit is officially supported on Red Hat, CentOS, Fedora and Ubuntu distributions, as long as you can install a GCC version that is C++14 compatible and the few dependencies we need. +Qiskit is officially supported on Red Hat, CentOS, Fedora, and Ubuntu distributions, as long as you can install a GCC version that is C++14 compatible and a few dependencies we need. #### Dependencies @@ -310,7 +310,7 @@ Ubuntu $ sudo apt install libopenblas-dev -And of course, `git` is required in order to build from repositories +And of course, `git` is required to build from repositories CentOS/Red Hat @@ -328,17 +328,17 @@ Ubuntu There are two ways of building `Aer` simulators, depending on your goal: -1. Build a python extension that works with Terra. +1. Build a Python extension that works with Terra. 2. Build a standalone executable. **Python extension** -As any other python package, we can install from source code by just running: +As any other Python package, we can install from source code by just running: qiskit-aer$ pip install . This will build and install `Aer` with the default options which is probably suitable for most of the users. -There's another pythonic approach to build and install software: build the wheels distributable file. +There's another Pythonic approach to build and install software: build the wheels distributable file. qiskit-aer$ python ./setup.py bdist_wheel @@ -374,9 +374,9 @@ the `dist/` directory, so next step is installing it: **Standalone Executable** -If we want to build a standalone executable, we have to use *CMake* directly. +If you want to build a standalone executable, you have to use *CMake* directly. The preferred way *CMake* is meant to be used, is by setting up an "out of -source" build. So in order to build our standalone executable, we have to follow +source" build. So in order to build your standalone executable, you have to follow these steps: qiskit-aer$ mkdir out @@ -396,8 +396,8 @@ option): **Advanced options** Because the standalone version of `Aer` doesn't need Python at all, the build system is -based on CMake, just like most of other C++ projects. So in order to pass all the different -options we have on `Aer` to CMake we use it's native mechanism: +based on CMake, just like most of other C++ projects. So to pass all the different +options we have on `Aer` to CMake, we use its native mechanism: qiskit-aer/out$ cmake -DCMAKE_CXX_COMPILER=g++-9 -DAER_BLAS_LIB_PATH=/path/to/my/blas .. @@ -421,17 +421,17 @@ You further need to have *Xcode Command Line Tools* installed on macOS: There are two ways of building `Aer` simulators, depending on your goal: -1. Build a python extension that works with Terra; +1. Build a Python extension that works with Terra; 2. Build a standalone executable. **Python extension** -As any other python package, we can install from source code by just running: +As any other Python package, we can install from source code by just running: qiskit-aer$ pip install . This will build and install `Aer` with the default options which is probably suitable for most of the users. -There's another pythonic approach to build and install software: build the wheels distributable file. +There's another Pythonic approach to build and install software: build the wheels distributable file. qiskit-aer$ python ./setup.py bdist_wheel @@ -467,9 +467,9 @@ the `dist/` directory, so next step is installing it: **Standalone Executable** -If we want to build a standalone executable, we have to use **CMake** directly. +If you want to build a standalone executable, you have to use **CMake** directly. The preferred way **CMake** is meant to be used, is by setting up an "out of -source" build. So in order to build our standalone executable, we have to follow +source" build. So in order to build your standalone executable, you have to follow these steps: qiskit-aer$ mkdir out @@ -488,8 +488,8 @@ option): ***Advanced options*** Because the standalone version of `Aer` doesn't need Python at all, the build system is -based on CMake, just like most of other C++ projects. So in order to pass all the different -options we have on `Aer` to CMake we use it's native mechanism: +based on CMake, just like most of other C++ projects. So to pass all the different +options we have on `Aer` to CMake, we use its native mechanism: qiskit-aer/out$ cmake -DCMAKE_CXX_COMPILER=g++-9 -DAER_BLAS_LIB_PATH=/path/to/my/blas .. @@ -499,7 +499,7 @@ options we have on `Aer` to CMake we use it's native mechanism: #### Dependencies -On Windows, you must have *Anaconda3* installed. We recommend also installing +On Windows, you must have *Anaconda3* installed. We also recommend installing *Visual Studio 2017 Community Edition* or *Visual Studio 2019 Community Edition*. >*Anaconda 3* can be installed from their web: @@ -518,19 +518,19 @@ create an Anaconda virtual environment or activate it if you already have create We only support *Visual Studio* compilers on Windows, so if you have others installed in your machine (MinGW, TurboC) you have to make sure that the path to the *Visual Studio* tools has precedence over others so that the build system can get the correct one. -There's a (recommended) way to force the build system to use the one you want by using CMake `-G` parameter. Will talk +There's a (recommended) way to force the build system to use the one you want by using CMake `-G` parameter. We will talk about this and other parameters later. #### Build **Python extension** -As any other python package, we can install from source code by just running: +As any other Python package, we can install from source code by just running: (QiskitDevEnv) qiskit-aer > pip install . This will build and install `Aer` with the default options which is probably suitable for most of the users. -There's another pythonic approach to build and install software: build the wheels distributable file. +There's another Pythonic approach to build and install software: build the wheels distributable file. (QiskitDevEnv) qiskit-aer > python ./setup.py bdist_wheel @@ -566,9 +566,9 @@ the `dist/` directory, so next step is installing it: **Standalone Executable** -If we want to build a standalone executable, we have to use **CMake** directly. +If you want to build a standalone executable, you have to use **CMake** directly. The preferred way **CMake** is meant to be used, is by setting up an "out of -source" build. So in order to build our standalone executable, we have to follow +source" build. So in order to build our standalone executable, you have to follow these steps: (QiskitDevEnv) qiskit-aer> mkdir out @@ -587,8 +587,8 @@ option): ***Advanced options*** Because the standalone version of `Aer` doesn't need Python at all, the build system is -based on CMake, just like most of other C++ projects. So in order to pass all the different -options we have on `Aer` to CMake we use it's native mechanism: +based on CMake, just like most of other C++ projects. So to pass all the different +options we have on `Aer` to CMake, we use its native mechanism: (QiskitDevEnv) qiskit-aer\out> cmake -G "Visual Studio 15 2017" -DAER_BLAS_LIB_PATH=c:\path\to\my\blas .. @@ -596,11 +596,11 @@ options we have on `Aer` to CMake we use it's native mechanism: ### Building with GPU support Qiskit Aer can exploit GPU's horsepower to accelerate some simulations, specially the larger ones. -GPU access is supported via CUDA® (NVIDIA® chipset), so in order to build with GPU support we need +GPU access is supported via CUDA® (NVIDIA® chipset), so to build with GPU support, you need to have CUDA® >= 10.1 preinstalled. See install instructions [here](https://developer.nvidia.com/cuda-toolkit-archive) Please note that we only support GPU acceleration on Linux platforms at the moment. -Once CUDA® is properly installed, we only need to set a flag so the build system knows what to do: +Once CUDA® is properly installed, you only need to set a flag so the build system knows what to do: ``` AER_THRUST_BACKEND=CUDA @@ -610,8 +610,8 @@ For example, qiskit-aer$ python ./setup.py bdist_wheel -- -DAER_THRUST_BACKEND=CUDA -If we want to specify the CUDA® architecture instead of letting the build system -auto detect it, we can use the AER_CUDA_ARCH flag (can also be set as an ENV variable +If you want to specify the CUDA® architecture instead of letting the build system +auto detect it, you can use the AER_CUDA_ARCH flag (can also be set as an ENV variable with the same name, although the flag takes precedence). For example: qiskit-aer$ python ./setup.py bdist_wheel -- -DAER_THRUST_BACKEND=CUDA -DAER_CUDA_ARCH="5.2" @@ -800,7 +800,7 @@ pass them right after ``-D`` CMake argument. Example: qiskit-aer/out$ cmake -DUSEFUL_FLAG=Value .. ``` -In the case of building the Qiskit python extension, you have to pass these flags after writing +In the case of building the Qiskit Python extension, you have to pass these flags after writing ``--`` at the end of the python command line, eg: ``` @@ -820,8 +820,7 @@ These are the flags: * AER_BLAS_LIB_PATH Tells CMake the directory to look for the BLAS library instead of the usual paths. - If no BLAS library is found under that directory, CMake will raise an error and stop. - + If no BLAS library is found under that directory, CMake will raise an error and terminate. It can also be set as an ENV variable with the same name, although the flag takes precedence. Values: An absolute path. @@ -847,8 +846,8 @@ These are the flags: * AER_THRUST_BACKEND - We use Thrust library for GPU support through CUDA. If we want to build a version of `Aer` with GPU acceleration, we need to install CUDA and set this variable to the value: "CUDA". - There are other values that will use different CPU methods depending on the kind of backend we want to use: + We use Thrust library for GPU support through CUDA. If you want to build a version of `Aer` with GPU acceleration, you need to install CUDA and set this variable to the value: "CUDA". + There are other values that will use different CPU methods depending on the kind of backend you want to use: - "OMP": For OpenMP support - "TBB": For Intel Threading Building Blocks @@ -858,7 +857,7 @@ These are the flags: * AER_CUDA_ARCH - This flag allows us we to specify the CUDA architecture instead of letting the build system auto detect it. + This flag allows you to specify the CUDA architecture instead of letting the build system auto detect it. It can also be set as an ENV variable with the same name, although the flag takes precedence. Values: Auto | Common | All | List of valid CUDA architecture(s). @@ -908,13 +907,13 @@ These are the flags: ## Tests -Code contribution are expected to include tests that provide coverage for the +Code contributions are expected to include tests that provide coverage for the changes being made. We have two types of tests in the codebase: Qiskit Terra integration tests and Standalone integration tests. -For Qiskit Terra integration tests, you first need to build and install the Qiskit python extension, and then run `unittest` Python framework. +For Qiskit Terra integration tests, you first need to build and install the Qiskit Python extension, and then run `unittest` Python framework. ``` qiskit-aer$ pip install . @@ -923,7 +922,7 @@ qiskit-aer$ stestr run Manual for `stestr` can be found [here](https://stestr.readthedocs.io/en/latest/MANUAL.html#). -The integration tests for Qiskit python extension are included in: `test/terra`. +The integration tests for Qiskit Python extension are included in: `test/terra`. ## C++ Tests @@ -952,17 +951,17 @@ corresponding tests to verify this compatibility. ## Debug -We have to build in debug mode if we want to start a debugging session with tools like `gdb` or `lldb`. -In order to create a Debug build for all platforms, we just need to pass a parameter while invoking the build to +You have to build in debug mode if you want to start a debugging session with tools like `gdb` or `lldb`. +To create a Debug build for all platforms, you just need to pass a parameter while invoking the build to create the wheel file: qiskit-aer$> python ./setup.py bdist_wheel --build-type=Debug -If you want to debug the standalone executable, then the parameter changes to: +If you want to debug the standalone executable, the parameter changes to: qiskit-aer/out$> cmake -DCMAKE_BUILD_TYPE=Debug -There are three different build configurations: `Release`, `Debug`, and `Release with Debug Symbols`, which parameters are: +There are three different build configurations: `Release`, `Debug`, and `Release with Debug Symbols`, whose parameters are: `Release`, `Debug`, `RelWithDebInfo` respectively. We recommend building in verbose mode and dump all the output to a file so it's easier to inspect possible build issues: @@ -976,7 +975,7 @@ On Windows: qisikt-aer> set VERBOSE=1 qiskit-aer> python ./setup.py bdist_wheel --build-type=Debug 1> build.log 2>&1 -We encourage to always send the whole `build.log` file when reporting a build issue, otherwise we will ask for it :) +We encourage you to always send the whole `build.log` file when reporting a build issue, otherwise we will ask for it :) **Stepping through the code** @@ -986,9 +985,9 @@ Standalone version doesn't require anything special, just use your debugger like qiskit-aer/out/Debug$ gdb qasm_simulator Stepping through the code of a Python extension is another story, trickier, but possible. This is because Python interpreters -usually load Python extensions dynamically, so we need to start debugging the python interpreter and set our breakpoints ahead of time, before any of our python extension symbols are loaded into the process. +usually load Python extensions dynamically, so we need to start debugging the Python interpreter and set our breakpoints ahead of time, before any of our Python extension symbols are loaded into the process. -Once built and installed we have to run the debugger with the python interpreter: +Once built and installed, we have to run the debugger with the Python interpreter: $ lldb python @@ -1004,9 +1003,9 @@ Then we have to set our breakpoints: Breakpoint 1: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. -Here the message is clear, it can't find the function: `AER::controller_execute` because our python extension hasn't been loaded yet - by the python interpreter, so it's "on-hold" hoping to find the function later in the execution. -Now we can run the python interpreter and pass the arguments (the python file to execute): +Here the message is clear, it can't find the function: `AER::controller_execute` because our Python extension hasn't been loaded yet + by the Python interpreter, so it's "on-hold" hoping to find the function later in the execution. +Now we can run the Python interpreter and pass the arguments (the python file to execute): (lldb) r test_qiskit_program.py Process 24896 launched: '/opt/anaconda3/envs/aer37/bin/python' (x86_64) From feb9fb26fb8eb37097de20163e53eea06ecb24f2 Mon Sep 17 00:00:00 2001 From: Amir Ebrahimi Date: Fri, 5 Mar 2021 11:31:37 -0800 Subject: [PATCH 2/7] Update README.md to mention Linux-only GPU support (#1095) Co-authored-by: Christopher J. Wood Co-authored-by: Matthew Treinish --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 822563289e..dce7fad753 100755 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ To install from source, follow the instructions in the [contribution guidelines] ## Installing GPU support -In order to install and run the GPU supported simulators, you need CUDA® 10.1 or newer previously installed. +In order to install and run the GPU supported simulators on Linux, you need CUDA® 10.1 or newer previously installed. CUDA® itself would require a set of specific GPU drivers. Please follow CUDA® installation procedure in the NVIDIA® [web](https://www.nvidia.com/drivers). If you want to install our GPU supported simulators, you have to install this other package: @@ -33,6 +33,11 @@ This will overwrite your current `qiskit-aer` package installation giving you the same functionality found in the canonical `qiskit-aer` package, plus the ability to run the GPU supported simulators: statevector, density matrix, and unitary. +**Note**: This package is only available on x86_64 Linux. For other platforms +that have CUDA support you will have to build from source. You can refer to +the [contributing guide](https://github.com/Qiskit/qiskit-aer/blob/master/CONTRIBUTING.md#building-with-gpu-support) +for instructions on doing this. + ## Simulating your first quantum program with Qiskit Aer Now that you have Qiskit Aer installed, you can start simulating quantum circuits with noise. Here is a basic example: From 7fe4f9eadc835705c25c13c753b8550b199efa44 Mon Sep 17 00:00:00 2001 From: "Christopher J. Wood" Date: Mon, 8 Mar 2021 15:50:25 -0500 Subject: [PATCH 3/7] Fix expval tests (#1173) --- .../qasm_simulator/qasm_save_expval.py | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/test/terra/backends/qasm_simulator/qasm_save_expval.py b/test/terra/backends/qasm_simulator/qasm_save_expval.py index ed615f143a..e96c6b720f 100644 --- a/test/terra/backends/qasm_simulator/qasm_save_expval.py +++ b/test/terra/backends/qasm_simulator/qasm_save_expval.py @@ -46,9 +46,9 @@ def test_save_expval_stabilizer_pauli(self, pauli): # Stabilizer test circuit state_circ = qi.random_clifford(2, seed=SEED).to_circuit() - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Pauli(pauli)) state = qi.Statevector(state_circ) - target = state.expectation_value(oper).real.round(10) + target = state.expectation_value(oper).real # Snapshot circuit opts = self.BACKEND_OPTS.copy() @@ -78,7 +78,7 @@ def test_save_expval_var_stabilizer_pauli(self, pauli): # Stabilizer test circuit state_circ = qi.random_clifford(2, seed=SEED).to_circuit() - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Pauli(pauli)) state = qi.Statevector(state_circ) expval = state.expectation_value(oper).real variance = state.expectation_value(oper ** 2).real - expval ** 2 @@ -178,9 +178,9 @@ def test_save_expval_nonstabilizer_pauli(self, pauli): # Stabilizer test circuit state_circ = QuantumVolume(2, 1, seed=SEED) - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Pauli(pauli)) state = qi.Statevector(state_circ) - target = state.expectation_value(oper).real.round(10) + target = state.expectation_value(oper).real # Snapshot circuit opts = self.BACKEND_OPTS.copy() @@ -209,7 +209,7 @@ def test_save_expval_var_nonstabilizer_pauli(self, pauli): # Stabilizer test circuit state_circ = QuantumVolume(2, 1, seed=SEED) - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Pauli(pauli)) state = qi.Statevector(state_circ) expval = state.expectation_value(oper).real variance = state.expectation_value(oper ** 2).real - expval ** 2 @@ -244,7 +244,7 @@ def test_save_expval_nonstabilizer_hermitian(self, qubits): state_circ = QuantumVolume(3, 1, seed=SEED) oper = qi.random_hermitian(4, traceless=True, seed=SEED) state = qi.Statevector(state_circ) - target = state.expectation_value(oper, qubits).real.round(10) + target = state.expectation_value(oper, qubits).real # Snapshot circuit opts = self.BACKEND_OPTS.copy() @@ -305,7 +305,7 @@ def test_save_expval_cptp_pauli(self, pauli): opts = self.BACKEND_OPTS.copy() if opts.get('method') in SUPPORTED_METHODS: - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Pauli(pauli)) # CPTP channel test circuit channel = qi.random_quantum_channel(4, seed=SEED) @@ -313,7 +313,7 @@ def test_save_expval_cptp_pauli(self, pauli): state_circ.append(channel, range(2)) state = qi.DensityMatrix(state_circ) - target = state.expectation_value(oper).real.round(10) + target = state.expectation_value(oper).real # Snapshot circuit circ = transpile(state_circ, self.SIMULATOR) @@ -337,7 +337,7 @@ def test_save_expval_var_cptp_pauli(self, pauli): opts = self.BACKEND_OPTS.copy() if opts.get('method') in SUPPORTED_METHODS: - oper = qi.Pauli(pauli) + oper = qi.Operator(qi.Operator(qi.Pauli(pauli))) # CPTP channel test circuit channel = qi.random_quantum_channel(4, seed=SEED) From d69f7e921f120a7db62705f4cf0498eaee9b02dc Mon Sep 17 00:00:00 2001 From: "Christopher J. Wood" Date: Tue, 9 Mar 2021 01:43:24 -0500 Subject: [PATCH 4/7] Fix extended stabilizer method basis gates (#1175) --- qiskit/providers/aer/backends/qasm_simulator.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qiskit/providers/aer/backends/qasm_simulator.py b/qiskit/providers/aer/backends/qasm_simulator.py index 56a8ba75fd..e9f0eefdab 100644 --- a/qiskit/providers/aer/backends/qasm_simulator.py +++ b/qiskit/providers/aer/backends/qasm_simulator.py @@ -524,8 +524,8 @@ def _method_configuration(method=None): config.custom_instructions = sorted(['roerror', 'snapshot', 'save_statevector', 'save_expval', 'save_expval_var']) config.basis_gates = sorted([ - 'cx', 'cz', 'id', 'x', 'y', 'z', 'h', 's', 'sdg', 'sx', 'swap', - 'u0', 'u1', 'p', 'ccx', 'ccz', 'delay' + 'cx', 'cz', 'id', 'x', 'y', 'z', 'h', 's', 'sdg', 'sx', + 'swap', 'u0', 't', 'tdg', 'u1', 'p', 'ccx', 'ccz', 'delay' ] + config.custom_instructions) return config From 3d2575ae5fa2cd584210c3a4938b8afe4f3adb75 Mon Sep 17 00:00:00 2001 From: "Christopher J. Wood" Date: Tue, 9 Mar 2021 01:46:41 -0500 Subject: [PATCH 5/7] Update CODEOWNERS (#1174) --- .github/CODEOWNERS | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index ec0b63bb2b..8734413ddc 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -8,15 +8,13 @@ # Generic rule for the repository. This pattern is actually the one that will # apply unless specialized by a later rule -* @chriseclectic @vvilpas @atilag +* @chriseclectic @vvilpas # Individual folders on root directory -/qiskit @chriseclectic @atilag @vvilpas -/cmake @atilag @vvilpas -/doc @chriseclectic @atilag @vvilpas -/examples @chriseclectic @atilag @vvilpas -/contrib @chriseclectic @atilag @vvilpas -/test @chriseclectic @atilag @vvilpas -/src @chriseclectic @atilag @vvilpas - -# AER specific folders +/qiskit @chriseclectic @vvilpas @mtreinish +/test @chriseclectic @vvilpas @mtreinish +/doc @chriseclectic @vvilpas @mtreinish +/releasenotes @chriseclectic @vvilpas @mtreinish +/cmake @vvilpas +/contrib @chriseclectic @vvilpas @hhorii +/src @chriseclectic @vvilpas @hhorii From acd216d040c0d9ec1161c82331820841cb13386f Mon Sep 17 00:00:00 2001 From: Jun Doi Date: Wed, 10 Mar 2021 19:17:46 +0900 Subject: [PATCH 6/7] Fixes of multi-chunk State implementation (#1149) Co-authored-by: Victor Villar Co-authored-by: Christopher J. Wood --- CONTRIBUTING.md | 3 + src/controllers/controller.hpp | 55 +++ src/controllers/qasm_controller.hpp | 94 ++-- src/controllers/statevector_controller.hpp | 38 +- src/controllers/unitary_controller.hpp | 36 +- .../density_matrix/densitymatrix.hpp | 27 ++ .../density_matrix/densitymatrix_state.hpp | 4 +- .../densitymatrix_state_chunk.hpp | 425 ++++++++++++------ .../density_matrix/densitymatrix_thrust.hpp | 63 +++ src/simulators/state.hpp | 2 +- src/simulators/state_chunk.hpp | 102 +++-- src/simulators/statevector/chunk/chunk.hpp | 2 + .../statevector/chunk/chunk_container.hpp | 3 - .../chunk/device_chunk_container.hpp | 5 +- .../chunk/host_chunk_container.hpp | 3 + .../statevector/qubitvector_thrust.hpp | 10 +- .../statevector/statevector_state.hpp | 4 +- .../statevector/statevector_state_chunk.hpp | 236 +++++++++- src/simulators/unitary/unitary_state.hpp | 4 +- .../unitary/unitary_state_chunk.hpp | 129 ++++-- src/transpile/cacheblocking.hpp | 2 +- .../backends/qasm_simulator/qasm_chunk.py | 136 ++++++ ...est_qasm_simulator_density_matrix_chunk.py | 74 +++ .../test_qasm_simulator_density_matrix_mpi.py | 84 ---- ... test_qasm_simulator_statevector_chunk.py} | 49 +- 25 files changed, 1113 insertions(+), 477 deletions(-) create mode 100644 test/terra/backends/qasm_simulator/qasm_chunk.py create mode 100644 test/terra/backends/test_qasm_simulator_density_matrix_chunk.py delete mode 100644 test/terra/backends/test_qasm_simulator_density_matrix_mpi.py rename test/terra/backends/{test_qasm_simulator_statevector_mpi.py => test_qasm_simulator_statevector_chunk.py} (56%) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a3db828f52..44a59da025 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -681,7 +681,10 @@ This technique allows applying quantum gates to each chunk independently without Before the actual simulation, we apply transpilation to remap the input circuits to the equivalent circuits that has all the quantum gates on the lower qubits than the chunk's number of qubits. And the (noiseless) swap gates are inserted to exchange data. +Please refer to this paper (https://arxiv.org/abs/2102.02957) for more detailed algorithm and implementation of parallel simulation. + So to simulate by using multiple GPUs or multiple nodes on the cluster, following configurations should be set to backend options. +(If there is not enough memory to simulate the input circuit, Qiskit Aer automatically set following options, but it is recommended to explicitly set them) - blocking_enable diff --git a/src/controllers/controller.hpp b/src/controllers/controller.hpp index 8babb798ef..f20e0e3e3f 100755 --- a/src/controllers/controller.hpp +++ b/src/controllers/controller.hpp @@ -51,6 +51,7 @@ #include "noise/noise_model.hpp" #include "transpile/basic_opts.hpp" #include "transpile/truncate_qubits.hpp" +#include "transpile/cacheblocking.hpp" namespace AER { namespace Base { @@ -216,8 +217,19 @@ class Controller { set_distributed_parallelization(const std::vector &circuits, const std::vector &noise); + virtual bool multiple_chunk_required(const Circuit &circuit, + const Noise::NoiseModel &noise) const; + void save_exception_to_results(Result &result,const std::exception &e); + + //setting cache blocking transpiler + Transpile::CacheBlocking transpile_cache_blocking(const Circuit& circ, + const Noise::NoiseModel& noise, + const json_t& config, + const size_t complex_size,bool is_matrix) const; + + // Get system memory size size_t get_system_memory_mb(); size_t get_gpu_memory_mb(); @@ -274,6 +286,8 @@ class Controller { //process information (MPI) int myrank_ = 0; int num_processes_ = 1; + + uint_t cache_block_qubit_ = 0; }; //========================================================================= @@ -348,6 +362,11 @@ void Controller::set_config(const json_t &config) { JSON::get_value(accept_distributed_results_, "accept_distributed_results", config); } + //enable multiple qregs if cache blocking is enabled + cache_block_qubit_ = 0; + if(JSON::check_key("blocking_qubits", config)){ + JSON::get_value(cache_block_qubit_,"blocking_qubits", config); + } } void Controller::clear_config() { @@ -535,6 +554,21 @@ uint_t Controller::get_distributed_num_processes(bool par_shots) const } } +bool Controller::multiple_chunk_required(const Circuit &circ, + const Noise::NoiseModel &noise) const +{ + if(circ.num_qubits < 3) + return false; + + if(num_process_per_experiment_ > 1 || Controller::get_min_memory_mb() < required_memory_mb(circ, noise)) + return true; + + if(cache_block_qubit_ >= 2 && cache_block_qubit_ < circ.num_qubits) + return true; + + return false; +} + size_t Controller::get_system_memory_mb() { size_t total_physical_memory = 0; #if defined(__linux__) || defined(__APPLE__) @@ -654,6 +688,27 @@ void Controller::save_exception_to_results(Result &result,const std::exception & } } +Transpile::CacheBlocking Controller::transpile_cache_blocking(const Circuit& circ, + const Noise::NoiseModel& noise, + const json_t& config, + const size_t complex_size,bool is_matrix) const +{ + Transpile::CacheBlocking cache_block_pass; + + cache_block_pass.set_config(config); + if(!cache_block_pass.enabled()){ + //if blocking is not set by config, automatically set if required + if(multiple_chunk_required(circ,noise)){ + int nplace = num_process_per_experiment_; + if(num_gpus_ > 0) + nplace *= num_gpus_; + cache_block_pass.set_blocking(circ.num_qubits, get_min_memory_mb() << 20, nplace, complex_size,is_matrix); + } + } + + return cache_block_pass; +} + //------------------------------------------------------------------------- // Qobj execution //------------------------------------------------------------------------- diff --git a/src/controllers/qasm_controller.hpp b/src/controllers/qasm_controller.hpp index ba903aa45e..a408b2e83d 100755 --- a/src/controllers/qasm_controller.hpp +++ b/src/controllers/qasm_controller.hpp @@ -215,11 +215,6 @@ class QasmController : public Base::Controller { const Operations::OpSet &opset, const json_t& config) const; - - Transpile::CacheBlocking transpile_cache_blocking(const Circuit& circ, - const Noise::NoiseModel& noise, - const json_t& config) const; - //---------------------------------------------------------------- // Run circuit helpers //---------------------------------------------------------------- @@ -306,9 +301,6 @@ class QasmController : public Base::Controller { // Controller-level parameter for CH method bool extended_stabilizer_measure_sampling_ = false; - - //using multiple chunks - bool multiple_qregs_ = false; }; //========================================================================= @@ -381,11 +373,6 @@ void QasmController::set_config(const json_t& config) { "QasmController: initial_statevector is not a unit vector"); } } - - //enable multiple qregs if cache blocking is enabled - if(JSON::check_key("blocking_enable", config)){ - JSON::get_value(multiple_qregs_,"blocking_enable", config); - } } void QasmController::clear_config() { @@ -407,7 +394,7 @@ void QasmController::run_circuit(const Circuit& circ, // Validate circuit for simulation method switch (simulation_method(circ, noise, true)) { case Method::statevector: { - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper>>( @@ -440,7 +427,7 @@ void QasmController::run_circuit(const Circuit& circ, "QasmController: method statevector_gpu is not supported on this " "system"); #else - if(multiple_qregs_ || (parallel_shots_ > 1 || parallel_experiments_ > 1)){ + if(Base::Controller::multiple_chunk_required(circ,noise) || (parallel_shots_ > 1 || parallel_experiments_ > 1)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper< @@ -478,7 +465,7 @@ void QasmController::run_circuit(const Circuit& circ, "QasmController: method statevector_thrust is not supported on this " "system"); #else - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper< @@ -511,7 +498,7 @@ void QasmController::run_circuit(const Circuit& circ, #endif } case Method::density_matrix: { - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision density matrix simulation return run_circuit_helper< @@ -548,7 +535,7 @@ void QasmController::run_circuit(const Circuit& circ, "QasmController: method density_matrix_gpu is not supported on this " "system"); #else - if(multiple_qregs_ || (parallel_shots_ > 1 || parallel_experiments_ > 1)){ + if(Base::Controller::multiple_chunk_required(circ,noise) || (parallel_shots_ > 1 || parallel_experiments_ > 1)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision density matrix simulation return run_circuit_helper< @@ -586,7 +573,7 @@ void QasmController::run_circuit(const Circuit& circ, "this " "system"); #else - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (simulation_precision_ == Precision::double_precision) { // Double-precision density matrix simulation return run_circuit_helper< @@ -938,42 +925,6 @@ Transpile::Fusion QasmController::transpile_fusion(Method method, return fusion_pass; } -Transpile::CacheBlocking QasmController::transpile_cache_blocking(const Circuit& circ, - const Noise::NoiseModel& noise, - const json_t& config) const -{ - Transpile::CacheBlocking cache_block_pass; - - cache_block_pass.set_config(config); - if(!cache_block_pass.enabled()){ - //if blocking is not set by config, automatically set if required - if(Base::Controller::num_process_per_experiment_ > 1 || Base::Controller::get_min_memory_mb() < required_memory_mb(circ, noise)){ - int nplace = Base::Controller::num_process_per_experiment_; - if(Base::Controller::num_gpus_ > 0) - nplace *= Base::Controller::num_gpus_; - - size_t complex_size = (simulation_precision_ == Precision::single_precision) ? sizeof(std::complex) : sizeof(std::complex); - - switch (simulation_method(circ, noise, false)) { - case Method::statevector: - case Method::statevector_thrust_cpu: - case Method::statevector_thrust_gpu: - cache_block_pass.set_blocking(circ.num_qubits, Base::Controller::get_min_memory_mb() << 20, nplace, complex_size,false); - break; - case Method::density_matrix: - case Method::density_matrix_thrust_cpu: - case Method::density_matrix_thrust_gpu: - cache_block_pass.set_blocking(circ.num_qubits, Base::Controller::get_min_memory_mb() << 20, nplace, complex_size,true); - break; - default: - throw std::runtime_error("QasmController: No enough memory to simulate this method on the sysytem"); - } - } - } - - return cache_block_pass; -} - void QasmController::set_parallelization_circuit( const Circuit& circ, const Noise::NoiseModel& noise_model) { @@ -1148,9 +1099,19 @@ void QasmController::run_circuit_helper(const Circuit& circ, auto fusion_pass = transpile_fusion(method, opt_circ.opset(), config); fusion_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); - auto cache_block_pass = transpile_cache_blocking(opt_circ,noise,config); + bool is_matrix = false; + if(method == Method::density_matrix || method == Method::density_matrix_thrust_gpu || method == Method::density_matrix_thrust_cpu) + is_matrix = true; + auto cache_block_pass = transpile_cache_blocking(opt_circ,noise,config,(simulation_precision_ == Precision::single_precision) ? sizeof(std::complex) : sizeof(std::complex),is_matrix); cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); + uint_t block_bits = 0; + if(cache_block_pass.enabled()) + block_bits = cache_block_pass.block_bits(); + + //allocate qubit register + state.allocate(Base::Controller::max_qubits_,block_bits); + // Run simulation run_multi_shot(opt_circ, shots, state, initial_state, method, result, rng); } @@ -1179,9 +1140,6 @@ void QasmController::run_multi_shot(const Circuit& circ, // Implement measure sampler auto pos = circ.first_measure_pos; // Position of first measurement op - //allocate qubit register - state.allocate(Base::Controller::max_qubits_); - // Run circuit instructions before first measure std::vector ops(circ.ops.begin(), circ.ops.begin() + pos); @@ -1197,9 +1155,6 @@ void QasmController::run_multi_shot(const Circuit& circ, // Add measure sampling metadata result.metadata.add(true, "measure_sampling"); } else { - //allocate qubit register - state.allocate(Base::Controller::max_qubits_); - // Perform standard execution if we cannot apply the // measurement sampling optimization while (shots-- > 0) { @@ -1225,10 +1180,10 @@ void QasmController::run_circuit_with_sampled_noise(const Circuit& circ, measure_pass.set_config(config); Noise::NoiseModel dummy_noise; - auto cache_block_pass = transpile_cache_blocking(circ,noise,config); - - //allocate qubit register - state.allocate(Base::Controller::max_qubits_); + bool is_matrix = false; + if(method == Method::density_matrix || method == Method::density_matrix_thrust_gpu || method == Method::density_matrix_thrust_cpu) + is_matrix = true; + auto cache_block_pass = transpile_cache_blocking(circ,noise,config,(simulation_precision_ == Precision::single_precision) ? sizeof(std::complex) : sizeof(std::complex),is_matrix); // Sample noise using circuit method while (shots-- > 0) { @@ -1238,6 +1193,13 @@ void QasmController::run_circuit_with_sampled_noise(const Circuit& circ, fusion_pass.optimize_circuit(noise_circ, dummy_noise, state.opset(), result); cache_block_pass.optimize_circuit(noise_circ, dummy_noise, state.opset(), result); + uint_t block_bits = 0; + if(cache_block_pass.enabled()) + block_bits = cache_block_pass.block_bits(); + + //allocate qubit register + state.allocate(Base::Controller::max_qubits_,block_bits); + run_single_shot(noise_circ, state, initial_state, result, rng); } } diff --git a/src/controllers/statevector_controller.hpp b/src/controllers/statevector_controller.hpp index b851632c31..db5c9a9cfe 100755 --- a/src/controllers/statevector_controller.hpp +++ b/src/controllers/statevector_controller.hpp @@ -124,9 +124,6 @@ class StatevectorController : public Base::Controller { // Precision of statevector Precision precision_ = Precision::double_precision; - //using multiple chunks - bool multiple_qregs_ = false; - }; //========================================================================= @@ -182,11 +179,6 @@ void StatevectorController::set_config(const json_t& config) { precision_ = Precision::single_precision; } } - - //enable multiple qregs if cache blocking is enabled - if(JSON::check_key("blocking_enable", config)){ - JSON::get_value(multiple_qregs_,"blocking_enable", config); - } } void StatevectorController::clear_config() { @@ -215,7 +207,7 @@ void StatevectorController::run_circuit( switch (method_) { case Method::automatic: case Method::statevector_cpu: { - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper>>( @@ -240,7 +232,7 @@ void StatevectorController::run_circuit( } case Method::statevector_thrust_gpu: { #ifdef AER_THRUST_CUDA - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper< @@ -275,7 +267,7 @@ void StatevectorController::run_circuit( } case Method::statevector_thrust_cpu: { #ifdef AER_THRUST_CPU - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision Statevector simulation return run_circuit_helper< @@ -353,34 +345,32 @@ void StatevectorController::run_circuit_helper( result.set_config(config); // Optimize circuit - const std::vector* op_ptr = &circ.ops; Transpile::Fusion fusion_pass; - Transpile::CacheBlocking cache_block_pass; - fusion_pass.set_config(config); - cache_block_pass.set_config(config); - fusion_pass.set_parallelization(parallel_state_update_); - Circuit opt_circ; + Circuit opt_circ = circ; // copy circuit + Noise::NoiseModel dummy_noise; // dummy object for transpile pass if (fusion_pass.active && circ.num_qubits >= fusion_pass.threshold) { - opt_circ = circ; // copy circuit - Noise::NoiseModel dummy_noise; // dummy object for transpile pass fusion_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); - cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); - op_ptr = &opt_circ.ops; } - // Run single shot collecting measure data or snapshots - state.allocate(Base::Controller::max_qubits_); + Transpile::CacheBlocking cache_block_pass = transpile_cache_blocking(opt_circ,dummy_noise,config,(precision_ == Precision::single_precision) ? sizeof(std::complex) : sizeof(std::complex),false); + cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); + uint_t block_bits = 0; + if(cache_block_pass.enabled()) + block_bits = cache_block_pass.block_bits(); + state.allocate(Base::Controller::max_qubits_,block_bits); + + // Run single shot collecting measure data or snapshots if (initial_state_.empty()) { state.initialize_qreg(circ.num_qubits); } else { state.initialize_qreg(circ.num_qubits, initial_state_); } state.initialize_creg(circ.num_memory, circ.num_registers); - state.apply_ops(*op_ptr, result, rng); + state.apply_ops(opt_circ.ops, result, rng); Base::Controller::save_count_data(result, state.creg()); // Add final state to the data diff --git a/src/controllers/unitary_controller.hpp b/src/controllers/unitary_controller.hpp index 935ca69dc6..f54f52d5b2 100755 --- a/src/controllers/unitary_controller.hpp +++ b/src/controllers/unitary_controller.hpp @@ -113,10 +113,6 @@ class UnitaryController : public Base::Controller { // Precision of a unitary matrix Precision precision_ = Precision::double_precision; - - //using multiple chunks - bool multiple_qregs_ = false; - }; //========================================================================= @@ -172,11 +168,6 @@ void UnitaryController::set_config(const json_t &config) { precision_ = Precision::single_precision; } } - - //enable multiple qregs if cache blocking is enabled - if(JSON::check_key("blocking_enable", config)){ - JSON::get_value(multiple_qregs_,"blocking_enable", config); - } } void UnitaryController::clear_config() { @@ -207,7 +198,7 @@ void UnitaryController::run_circuit(const Circuit &circ, switch (method_) { case Method::automatic: case Method::unitary_cpu: { - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision unitary simulation return run_circuit_helper< @@ -236,7 +227,7 @@ void UnitaryController::run_circuit(const Circuit &circ, } case Method::unitary_thrust_gpu: { #ifdef AER_THRUST_CUDA - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision unitary simulation return run_circuit_helper< @@ -270,7 +261,7 @@ void UnitaryController::run_circuit(const Circuit &circ, } case Method::unitary_thrust_cpu: { #ifdef AER_THRUST_CPU - if(multiple_qregs_){ + if(Base::Controller::multiple_chunk_required(circ,noise)){ if (precision_ == Precision::double_precision) { // Double-precision unitary simulation return run_circuit_helper< @@ -354,25 +345,26 @@ void UnitaryController::run_circuit_helper( result.metadata.add(state.name(), "method"); // Optimize circuit - const std::vector* op_ptr = &circ.ops; Transpile::Fusion fusion_pass; - Transpile::CacheBlocking cache_block_pass; fusion_pass.threshold /= 2; // Halve default threshold for unitary simulator fusion_pass.set_config(config); - cache_block_pass.set_config(config); fusion_pass.set_parallelization(parallel_state_update_); - Circuit opt_circ; + Circuit opt_circ = circ; // copy circuit + Noise::NoiseModel dummy_noise; // dummy object for transpile pass if (fusion_pass.active && circ.num_qubits >= fusion_pass.threshold) { - opt_circ = circ; // copy circuit - Noise::NoiseModel dummy_noise; // dummy object for transpile pass fusion_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); - cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); - op_ptr = &opt_circ.ops; } + Transpile::CacheBlocking cache_block_pass = transpile_cache_blocking(opt_circ,dummy_noise,config,(precision_ == Precision::single_precision) ? sizeof(std::complex) : sizeof(std::complex),true); + cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result); + + uint_t block_bits = 0; + if(cache_block_pass.enabled()) + block_bits = cache_block_pass.block_bits(); + state.allocate(Base::Controller::max_qubits_,block_bits); + // Run single shot collecting measure data or snapshots - state.allocate(Base::Controller::max_qubits_); if (initial_unitary_.empty()) { state.initialize_qreg(circ.num_qubits); @@ -380,7 +372,7 @@ void UnitaryController::run_circuit_helper( state.initialize_qreg(circ.num_qubits, initial_unitary_); } state.initialize_creg(circ.num_memory, circ.num_registers); - state.apply_ops(*op_ptr, result, rng); + state.apply_ops(opt_circ.ops, result, rng); Base::Controller::save_count_data(result, state.creg()); // Add final state unitary to the data diff --git a/src/simulators/density_matrix/densitymatrix.hpp b/src/simulators/density_matrix/densitymatrix.hpp index a013296702..2e2b9ad833 100755 --- a/src/simulators/density_matrix/densitymatrix.hpp +++ b/src/simulators/density_matrix/densitymatrix.hpp @@ -131,6 +131,7 @@ class DensityMatrix : public UnitaryMatrix { // Return Pauli expectation value double expval_pauli(const reg_t &qubits, const std::string &pauli,const complex_t initial_phase=1.0) const; + double expval_pauli_non_diagonal_chunk(const reg_t &qubits, const std::string &pauli,const complex_t initial_phase=1.0) const; protected: @@ -400,6 +401,32 @@ double DensityMatrix::expval_pauli(const reg_t &qubits, std::move(lambda), size_t(0), nrows >> 1)); } +template +double DensityMatrix::expval_pauli_non_diagonal_chunk(const reg_t &qubits, + const std::string &pauli,const complex_t initial_phase) const +{ + uint_t x_mask, z_mask, num_y, x_max; + std::tie(x_mask, z_mask, num_y, x_max) = QV::pauli_masks_and_phase(qubits, pauli); + + // Size of density matrix + const size_t nrows = BaseMatrix::rows_; + + auto phase = std::complex(initial_phase); + QV::add_y_phase(num_y, phase); + + auto lambda = [&](const int_t i, double &val_re, double &val_im)->void { + (void)val_im; // unused + auto idx_mat = i ^ x_mask + nrows * i; + auto val = std::real(phase * BaseVector::data_[idx_mat]); + if (z_mask && (AER::Utils::popcount(i & z_mask) & 1)) { + val = - val; + } + val_re += val; + }; + return std::real(BaseVector::apply_reduction_lambda( + std::move(lambda), size_t(0), nrows)); +} + //----------------------------------------------------------------------- // Z-measurement outcome probabilities //----------------------------------------------------------------------- diff --git a/src/simulators/density_matrix/densitymatrix_state.hpp b/src/simulators/density_matrix/densitymatrix_state.hpp index 19bf2b43f8..25c6b80322 100644 --- a/src/simulators/density_matrix/densitymatrix_state.hpp +++ b/src/simulators/density_matrix/densitymatrix_state.hpp @@ -129,7 +129,7 @@ class State : public Base::State { virtual std::vector sample_measure(const reg_t &qubits, uint_t shots, RngEngine &rng) override; - virtual void allocate(uint_t num_qubits) override; + virtual void allocate(uint_t num_qubits,uint_t block_bits) override; //----------------------------------------------------------------------- // Additional methods @@ -359,7 +359,7 @@ const stringmap_t State::snapshotset_( // Initialization //------------------------------------------------------------------------- template -void State::allocate(uint_t num_qubits) +void State::allocate(uint_t num_qubits,uint_t block_bits) { BaseState::qreg_.chunk_setup(num_qubits*2,num_qubits*2,0,1); } diff --git a/src/simulators/density_matrix/densitymatrix_state_chunk.hpp b/src/simulators/density_matrix/densitymatrix_state_chunk.hpp index 2a625d7d13..31128fc989 100644 --- a/src/simulators/density_matrix/densitymatrix_state_chunk.hpp +++ b/src/simulators/density_matrix/densitymatrix_state_chunk.hpp @@ -27,36 +27,34 @@ #include "densitymatrix_thrust.hpp" #endif -//#include "densitymatrix_state.h" - namespace AER { namespace DensityMatrixChunk { +using OpType = Operations::OpType; + // OpSet of supported instructions const Operations::OpSet StateOpSet( // Op types - {Operations::OpType::gate, Operations::OpType::measure, - Operations::OpType::reset, Operations::OpType::snapshot, - Operations::OpType::barrier, Operations::OpType::bfunc, - Operations::OpType::roerror, Operations::OpType::matrix, - Operations::OpType::diagonal_matrix, Operations::OpType::kraus, - Operations::OpType::superop, Operations::OpType::save_expval, - Operations::OpType::save_expval_var}, + {OpType::gate, OpType::measure, + OpType::reset, OpType::snapshot, + OpType::barrier, OpType::bfunc, + OpType::roerror, OpType::matrix, + OpType::diagonal_matrix, OpType::kraus, + OpType::superop, OpType::save_expval, + OpType::save_expval_var, OpType::save_densmat, + OpType::save_probs, OpType::save_probs_ket, + OpType::save_amps_sq + }, // Gates {"U", "CX", "u1", "u2", "u3", "u", "cx", "cy", "cz", "swap", "id", "x", "y", "z", "h", "s", "sdg", "t", "tdg", "ccx", "r", "rx", "ry", "rz", "rxx", "ryy", "rzz", "rzx", "p", "cp", "cu1", "sx", "x90", "delay", "pauli"}, // Snapshots - {"memory", "register", "probabilities", + {"density_matrix", "memory", "register", "probabilities", "probabilities_with_variance", "expectation_value_pauli", "expectation_value_pauli_with_variance"}); -// Allowed gates enum class -enum class Gates { - u1, u2, u3, r, rx,ry, rz, id, x, y, z, h, s, sdg, sx, t, tdg, - cx, cy, cz, swap, rxx, ryy, rzz, rzx, ccx, cp, pauli -}; //========================================================================= // DensityMatrix State subclass @@ -115,8 +113,9 @@ class State : public Base::StateChunk { void initialize_omp(); auto move_to_matrix(); - + auto copy_to_matrix(); protected: + auto apply_to_matrix(bool copy = false); //----------------------------------------------------------------------- // Apply instructions @@ -170,10 +169,28 @@ class State : public Base::StateChunk { // Save data instructions //----------------------------------------------------------------------- + // Save the current density matrix or reduced density matrix + void apply_save_density_matrix(const Operations::Op &op, + ExperimentResult &result, + bool last_op = false); + + // Helper function for computing expectation value + void apply_save_probs(const Operations::Op &op, + ExperimentResult &result); + + // Helper function for saving amplitudes squared + void apply_save_amplitudes_sq(const Operations::Op &op, + ExperimentResult &result); + // Helper function for computing expectation value virtual double expval_pauli(const reg_t &qubits, const std::string& pauli) override; + // Return the reduced density matrix for the simulator + cmatrix_t reduced_density_matrix(const reg_t &qubits, bool last_op = false); + cmatrix_t reduced_density_matrix_helper(const reg_t &qubits, + const reg_t &qubits_sorted); + //----------------------------------------------------------------------- // Measurement Helpers //----------------------------------------------------------------------- @@ -230,8 +247,6 @@ class State : public Base::StateChunk { ExperimentResult &result, bool variance); - // Return the reduced density matrix for the simulator - cmatrix_t reduced_density_matrix(const reg_t &qubits, const reg_t& qubits_sorted); //----------------------------------------------------------------------- // Single-qubit gate helpers @@ -276,7 +291,7 @@ void State::initialize_qreg(uint_t num_qubits) if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(i=0;i::initialize_qreg(uint_t num_qubits) #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i) for(i=0;inum_qubits_ == this->chunk_bits_){ BaseState::qregs_[i].initialize(); } @@ -309,7 +324,7 @@ void State::initialize_qreg(uint_t num_qubits, int_t iChunk; if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(iChunk=0;iChunk::initialize_qreg(uint_t num_qubits, #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) for(iChunk=0;iChunk> (BaseState::num_qubits_/2 - BaseState::chunk_bits_/2); - local_row_offset <<= (BaseState::chunk_bits_/2); - local_col_offset <<= (BaseState::chunk_bits_/2); + uint_t irow_chunk = ((iChunk + BaseState::global_chunk_index_) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_))) << (BaseState::chunk_bits_); + uint_t icol_chunk = ((iChunk + BaseState::global_chunk_index_) & ((1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)))-1)) << (BaseState::chunk_bits_); //copy part of state for this chunk uint_t i,row,col; cvector_t tmp(1ull << BaseState::chunk_bits_); for(i=0;i<(1ull << BaseState::chunk_bits_);i++){ - uint_t row = i & ((1ull << (BaseState::chunk_bits_/2))-1); - uint_t col = i >> (BaseState::chunk_bits_/2); - tmp[i] = input[local_row_offset + row + ((local_col_offset + col) << (BaseState::num_qubits_/2))]; + uint_t icol = i & ((1ull << (BaseState::chunk_bits_))-1); + uint_t irow = i >> (BaseState::chunk_bits_); + tmp[i] = input[icol_chunk + icol + ((irow_chunk + irow) << (BaseState::num_qubits_))]; } - BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_/2); + BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_); BaseState::qregs_[iChunk].initialize_from_vector(tmp); } } @@ -350,7 +363,7 @@ void State::initialize_qreg(uint_t num_qubits, int_t iChunk; if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(iChunk=0;iChunk::initialize_qreg(uint_t num_qubits, #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) for(iChunk=0;iChunk> (BaseState::num_qubits_/2 - BaseState::chunk_bits_/2); - local_row_offset <<= (BaseState::chunk_bits_/2); - local_col_offset <<= (BaseState::chunk_bits_/2); + uint_t irow_chunk = ((iChunk + BaseState::global_chunk_index_) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_))) << (BaseState::chunk_bits_); + uint_t icol_chunk = ((iChunk + BaseState::global_chunk_index_) & ((1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)))-1)) << (BaseState::chunk_bits_); //copy part of state for this chunk uint_t i,row,col; cvector_t tmp(1ull << BaseState::chunk_bits_); for(i=0;i<(1ull << BaseState::chunk_bits_);i++){ - uint_t row = i & ((1ull << (BaseState::chunk_bits_/2))-1); - uint_t col = i >> (BaseState::chunk_bits_/2); - tmp[i] = state[local_row_offset + row + ((local_col_offset + col) << (BaseState::num_qubits_/2))]; + uint_t icol = i & ((1ull << (BaseState::chunk_bits_))-1); + uint_t irow = i >> (BaseState::chunk_bits_); + tmp[i] = state[icol_chunk + icol + ((irow_chunk + irow) << (BaseState::num_qubits_))]; } - BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_/2); + BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_); BaseState::qregs_[iChunk].initialize_from_vector(tmp); } } @@ -391,7 +402,7 @@ void State::initialize_qreg(uint_t num_qubits, int_t iChunk; if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(iChunk=0;iChunk::initialize_qreg(uint_t num_qubits, #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) for(iChunk=0;iChunk> (BaseState::num_qubits_/2 - BaseState::chunk_bits_/2); - local_row_offset <<= (BaseState::chunk_bits_/2); - local_col_offset <<= (BaseState::chunk_bits_/2); + uint_t irow_chunk = ((iChunk + BaseState::global_chunk_index_) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_))) << (BaseState::chunk_bits_); + uint_t icol_chunk = ((iChunk + BaseState::global_chunk_index_) & ((1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)))-1)) << (BaseState::chunk_bits_); //copy part of state for this chunk uint_t i,row,col; cvector_t tmp(1ull << BaseState::chunk_bits_); for(i=0;i<(1ull << BaseState::chunk_bits_);i++){ - uint_t row = i & ((1ull << (BaseState::chunk_bits_/2))-1); - uint_t col = i >> (BaseState::chunk_bits_/2); - tmp[i] = state[local_row_offset + row + ((local_col_offset + col) << (BaseState::num_qubits_/2))]; + uint_t icol = i & ((1ull << (BaseState::chunk_bits_))-1); + uint_t irow = i >> (BaseState::chunk_bits_); + tmp[i] = state[icol_chunk + icol + ((irow_chunk + irow) << (BaseState::num_qubits_))]; } - BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_/2); + BaseState::qregs_[iChunk].set_num_qubits(BaseState::chunk_bits_); BaseState::qregs_[iChunk].initialize_from_vector(tmp); } } @@ -437,32 +446,94 @@ auto State::move_to_matrix() if(BaseState::num_global_chunks_ == 1){ return BaseState::qregs_[0].move_to_matrix(); } - else{ - int_t iChunk; - auto state = BaseState::qregs_[0].vector(); + return apply_to_matrix(false); +} + +template +auto State::copy_to_matrix() +{ + if(BaseState::num_global_chunks_ == 1){ + return BaseState::qregs_[0].copy_to_matrix(); + } + return apply_to_matrix(true); +} + +template +auto State::apply_to_matrix(bool copy) +{ + int_t iChunk; + uint_t size = 1ull << (BaseState::chunk_bits_*2); + uint_t mask = (1ull << (BaseState::chunk_bits_)) - 1; + uint_t num_threads = BaseState::qregs_[0].get_omp_threads(); + auto matrix = BaseState::qregs_[0].copy_to_matrix(); + + if(BaseState::distributed_rank_ == 0){ //TO DO check memory availability - state.resize(BaseState::num_local_chunks_ << BaseState::chunk_bits_); + matrix.resize(1ull << (BaseState::num_qubits_),1ull << (BaseState::num_qubits_)); -#pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) - for(iChunk=1;iChunk> ((BaseState::num_qubits_ - BaseState::chunk_bits_))) << (BaseState::chunk_bits_); + uint_t icol_chunk = ((iChunk) & ((1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)))-1)) << (BaseState::chunk_bits_); +#pragma omp parallel for if(num_threads > 1) num_threads(num_threads) + for(i=0;i> (BaseState::chunk_bits_); + uint_t icol = i & mask; + matrix(icol_chunk+icol,irow_chunk+irow) = recv(icol,irow); } } +#endif + for(iChunk=0;iChunk> ((BaseState::num_qubits_ - BaseState::chunk_bits_))) << (BaseState::chunk_bits_); + uint_t icol_chunk = ((iChunk + BaseState::global_chunk_index_) & ((1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)))-1)) << (BaseState::chunk_bits_); + if(copy){ + auto tmp = BaseState::qregs_[iChunk].copy_to_matrix(); +#pragma omp parallel for if(num_threads > 1) num_threads(num_threads) + for(i=0;i> (BaseState::chunk_bits_); + uint_t icol = i & mask; + matrix(icol_chunk+icol,irow_chunk+irow) = tmp(icol,irow); + } + } + else{ + auto tmp = BaseState::qregs_[iChunk].move_to_matrix(); +#pragma omp parallel for if(num_threads > 1) num_threads(num_threads) + for(i=0;i> (BaseState::chunk_bits_); + uint_t icol = i & mask; + matrix(icol_chunk+icol,irow_chunk+irow) = tmp(icol,irow); + } + } + } + } + else{ #ifdef AER_MPI - BaseState::gather_state(state); + //send matrices to process 0 + for(iChunk=0;iChunk::apply_op(const int_t iChunk,const Operations::Op &op, case Operations::OpType::superop: BaseState::qregs_[iChunk].apply_superop_matrix(op.qubits, Utils::vectorize_matrix(op.mats[0])); break; - case Operations::OpType::kraus: - apply_kraus(op.qubits, op.mats); - break; case Operations::OpType::save_expval: case Operations::OpType::save_expval_var: BaseState::apply_save_expval(op, result); break; + case Operations::OpType::save_densmat: + apply_save_density_matrix(op, result, final_ops); + break; + case Operations::OpType::save_probs: + case Operations::OpType::save_probs_ket: + apply_save_probs(op, result); + break; + case Operations::OpType::save_amps_sq: + apply_save_amplitudes_sq(op, result); + break; default: throw std::invalid_argument("DensityMatrix::State::invalid instruction \'" + op.name + "\'."); @@ -561,26 +639,26 @@ void State::apply_chunk_swap(const reg_t &qubits) uint_t q0,q1; q0 = qubits[0]; q1 = qubits[1]; - if(qubits[0] >= BaseState::chunk_bits_/2){ - q0 += BaseState::chunk_bits_/2; + if(qubits[0] >= BaseState::chunk_bits_){ + q0 += BaseState::chunk_bits_; } - if(qubits[1] >= BaseState::chunk_bits_/2){ - q1 += BaseState::chunk_bits_/2; + if(qubits[1] >= BaseState::chunk_bits_){ + q1 += BaseState::chunk_bits_; } reg_t qs0 = {{q0, q1}}; BaseState::apply_chunk_swap(qs0); - if(qubits[0] >= BaseState::chunk_bits_/2){ - q0 += (BaseState::num_qubits_ - BaseState::chunk_bits_)/2; + if(qubits[0] >= BaseState::chunk_bits_){ + q0 += (BaseState::num_qubits_ - BaseState::chunk_bits_); } else{ - q0 += BaseState::chunk_bits_/2; + q0 += BaseState::chunk_bits_; } - if(qubits[1] >= BaseState::chunk_bits_/2){ - q1 += (BaseState::num_qubits_ - BaseState::chunk_bits_)/2; + if(qubits[1] >= BaseState::chunk_bits_){ + q1 += (BaseState::num_qubits_ - BaseState::chunk_bits_); } else{ - q1 += BaseState::chunk_bits_/2; + q1 += BaseState::chunk_bits_; } reg_t qs1 = {{q0, q1}}; BaseState::apply_chunk_swap(qs1); @@ -590,9 +668,56 @@ void State::apply_chunk_swap(const reg_t &qubits) // Implementation: Save data //========================================================================= -template -double State::expval_pauli(const reg_t &qubits, - const std::string& pauli) +template +void State::apply_save_probs(const Operations::Op &op, + ExperimentResult &result) { + auto probs = measure_probs(op.qubits); + if (op.type == Operations::OpType::save_probs_ket) { + BaseState::save_data_average(result, op.string_params[0], + Utils::vec2ket(probs, json_chop_threshold_, 16), + op.save_type); + } else { + BaseState::save_data_average(result, op.string_params[0], + std::move(probs), op.save_type); + } +} + +template +void State::apply_save_amplitudes_sq(const Operations::Op &op, + ExperimentResult &result) +{ + if (op.int_params.empty()) { + throw std::invalid_argument("Invalid save_amplitudes_sq instructions (empty params)."); + } + const int_t size = op.int_params.size(); + int_t iChunk; + rvector_t amps_sq(size,0); +#pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) + for(iChunk=0;iChunk> ((BaseState::num_qubits_ - BaseState::chunk_bits_)); + icol = (BaseState::global_chunk_index_ + iChunk) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_))); + if(irow != icol) + continue; + +#pragma omp parallel for if (size > pow(2, omp_qubit_threshold_) && \ + BaseState::threads_ > 1) \ + num_threads(BaseState::threads_) + for (int_t i = 0; i < size; ++i) { + if(op.int_params[i] >= (irow << BaseState::chunk_bits_) && op.int_params[i] < ((irow+1) << BaseState::chunk_bits_)) + amps_sq[i] = BaseState::qregs_[iChunk].probability(op.int_params[i] - (irow << BaseState::chunk_bits_)); + } + } +#ifdef AER_MPI + BaseState::reduce_sum(amps_sq); +#endif + BaseState::save_data_average(result, op.string_params[0], + std::move(amps_sq), op.save_type); +} + +template +double State::expval_pauli(const reg_t &qubits, + const std::string& pauli) { reg_t qubits_in_chunk; reg_t qubits_out_chunk; @@ -604,7 +729,7 @@ double State::expval_pauli(const reg_t &qubits, //get inner/outer chunk pauli string n = pauli.size(); for(i=0;i::expval_pauli(const reg_t &qubits, } } - int_t nrows = 1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2); + int_t nrows = 1ull << ((BaseState::num_qubits_ - BaseState::chunk_bits_)); if(qubits_out_chunk.size() > 0){ //there are bits out of chunk std::complex phase = 1.0; @@ -625,10 +750,10 @@ double State::expval_pauli(const reg_t &qubits, uint_t x_mask, z_mask, num_y, x_max; std::tie(x_mask, z_mask, num_y, x_max) = AER::QV::pauli_masks_and_phase(qubits_out_chunk, pauli_out_chunk); - z_mask >>= (BaseState::chunk_bits_/2); + z_mask >>= (BaseState::chunk_bits_); if(x_mask != 0){ - x_mask >>= (BaseState::chunk_bits_/2); - x_max -= (BaseState::chunk_bits_/2); + x_mask >>= (BaseState::chunk_bits_); + x_max -= (BaseState::chunk_bits_); AER::QV::add_y_phase(num_y,phase); @@ -641,10 +766,10 @@ double State::expval_pauli(const reg_t &qubits, uint_t iChunk = (irow ^ x_mask) + irow * nrows; if(BaseState::chunk_index_begin_[BaseState::distributed_rank_] <= iChunk && BaseState::chunk_index_end_[BaseState::distributed_rank_] > iChunk){ //on this process - double sign = 1.0; - if (z_mask && (AER::Utils::popcount(iChunk & z_mask) & 1)) - sign = -1.0; - expval += sign * BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,phase); + double sign = 2.0; + if (z_mask && (AER::Utils::popcount(irow & z_mask) & 1)) + sign = -2.0; + expval += sign * BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli_non_diagonal_chunk(qubits_in_chunk, pauli_in_chunk,phase); } } } @@ -654,9 +779,9 @@ double State::expval_pauli(const reg_t &qubits, uint_t iChunk = i * (nrows+1); if(BaseState::chunk_index_begin_[BaseState::distributed_rank_] <= iChunk && BaseState::chunk_index_end_[BaseState::distributed_rank_] > iChunk){ //on this process double sign = 1.0; - if (z_mask && (AER::Utils::popcount((i + BaseState::global_chunk_index_) & z_mask) & 1)) + if (z_mask && (AER::Utils::popcount(i & z_mask) & 1)) sign = -1.0; - expval += sign * BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk); + expval += sign * BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,1.0); } } } @@ -666,7 +791,7 @@ double State::expval_pauli(const reg_t &qubits, for(i=0;i iChunk){ //on this process - expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits, pauli); + expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits, pauli,1.0); } } } @@ -677,6 +802,16 @@ double State::expval_pauli(const reg_t &qubits, return expval; } +template +void State::apply_save_density_matrix(const Operations::Op &op, + ExperimentResult &result, + bool last_op) +{ + BaseState::save_data_average(result, op.string_params[0], + reduced_density_matrix(op.qubits, last_op), + op.save_type); +} + //========================================================================= // Implementation: Snapshots //========================================================================= @@ -717,10 +852,10 @@ void State::apply_snapshot(const Operations::Op &op, snapshot_pauli_expval(op, result, true); } break; /* TODO - case DensityMatrix::Snapshots::expval_matrix: { + case Snapshots::expval_matrix: { snapshot_matrix_expval(op, data, false); } break; - case DensityMatrix::Snapshots::expval_matrix_var: { + case Snapshots::expval_matrix_var: { snapshot_matrix_expval(op, data, true); } break; */ @@ -775,10 +910,19 @@ template void State::snapshot_density_matrix(const Operations::Op &op, ExperimentResult &result, bool last_op) +{ + result.legacy_data.add_average_snapshot("density_matrix", op.string_params[0], + BaseState::creg_.memory_hex(), + reduced_density_matrix(op.qubits, last_op), false); +} + + +template +cmatrix_t State::reduced_density_matrix(const reg_t& qubits, bool last_op) { cmatrix_t reduced_state; // Check if tracing over all qubits - if (op.qubits.empty()) { + if (qubits.empty()) { reduced_state = cmatrix_t(1, 1); std::complex sum = 0.0; @@ -790,30 +934,26 @@ void State::snapshot_density_matrix(const Operations::Op &op, #endif reduced_state[0] = sum; } else { - - auto qubits_sorted = op.qubits; + auto qubits_sorted = qubits; std::sort(qubits_sorted.begin(), qubits_sorted.end()); - if ((op.qubits.size() == BaseState::qregs_[0].num_qubits()) && (op.qubits == qubits_sorted)) { + if ((qubits.size() == BaseState::num_qubits_) && (qubits == qubits_sorted)) { if (last_op) { reduced_state = move_to_matrix(); } else { - reduced_state = move_to_matrix(); + reduced_state = copy_to_matrix(); } } else { - reduced_state = reduced_density_matrix(op.qubits, qubits_sorted); + reduced_state = reduced_density_matrix_helper(qubits, qubits_sorted); } } - - result.legacy_data.add_average_snapshot("density_matrix", op.string_params[0], - BaseState::creg_.memory_hex(), - std::move(reduced_state), false); + return reduced_state; } - - + template -cmatrix_t State::reduced_density_matrix(const reg_t& qubits, const reg_t& qubits_sorted) { - +cmatrix_t State::reduced_density_matrix_helper(const reg_t &qubits, + const reg_t &qubits_sorted) +{ // Get superoperator qubits const reg_t squbits = BaseState::qregs_[0].superop_qubits(qubits); const reg_t squbits_sorted = BaseState::qregs_[0].superop_qubits(qubits_sorted); @@ -832,12 +972,12 @@ cmatrix_t State::reduced_density_matrix(const reg_t& qubits, const re auto vmat = BaseState::qregs_[0].vector(); //TO DO check memory availability - vmat.resize(BaseState::num_local_chunks_ << BaseState::chunk_bits_); + vmat.resize(BaseState::num_local_chunks_ << (BaseState::chunk_bits_*2)); #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) for(iChunk=1;iChunk::apply_gate(const uint_t iChunk, const Operations::Op &op) case DensityMatrix::Gates::rzx: BaseState::qregs_[iChunk].apply_unitary_matrix(op.qubits, Linalg::VMatrix::rzx(op.params[0])); break; + case DensityMatrix::Gates::pauli: + apply_pauli(op.qubits, op.string_params[0]); + break; default: // We shouldn't reach here unless there is a bug in gateset throw std::invalid_argument("DensityMatrix::State::invalid gate instruction \'" + @@ -1049,7 +1192,7 @@ rvector_t State::measure_probs(const reg_t &qubits) const reg_t qubits_out_chunk; for(i=0;i::measure_probs(const reg_t &qubits) const #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i,j,k) for(i=0;i> ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2); - icol = (BaseState::global_chunk_index_ + i) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2)); + irow = (BaseState::global_chunk_index_ + i) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_)); + icol = (BaseState::global_chunk_index_ + i) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_))); if(irow == icol){ //diagonal chunk auto chunkSum = BaseState::qregs_[i].probabilities(qubits); @@ -1076,12 +1219,12 @@ rvector_t State::measure_probs(const reg_t &qubits) const int idx = 0; int i_in = 0; for(k=0;k> i_in) & 1) << k); i_in++; } else{ - if((((i + BaseState::global_chunk_index_) << (BaseState::chunk_bits_/2)) >> qubits[k]) & 1){ + if((((i + BaseState::global_chunk_index_) << (BaseState::chunk_bits_)) >> qubits[k]) & 1){ idx += 1ull << k; } } @@ -1116,8 +1259,8 @@ std::vector State::sample_measure(const reg_t &qubits, #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i) for(i=0;i> ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2); - icol = (BaseState::global_chunk_index_ + i) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2)); + irow = (BaseState::global_chunk_index_ + i) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_)); + icol = (BaseState::global_chunk_index_ + i) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_))); if(irow == icol) //only diagonal chunk has probabilities chunkSum[i] = std::real( BaseState::qregs_[i].trace() ); else @@ -1150,29 +1293,33 @@ std::vector State::sample_measure(const reg_t &qubits, //get rnds positions for each chunk #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i,j) for(i=0;i vIdx; - std::vector vRnd; - - //find rnds in this chunk - nIn = 0; - for(j=0;j= chunkSum[i] + globalSum && rnds[j] < chunkSum[i+1] + globalSum){ - vRnd.push_back(rnds[j] - (globalSum + chunkSum[i])); - vIdx.push_back(j); - nIn++; - } + uint_t irow,icol; + irow = (BaseState::global_chunk_index_ + i) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_)); + icol = (BaseState::global_chunk_index_ + i) - (irow << ((BaseState::num_qubits_ - BaseState::chunk_bits_))); + if(irow != icol) + continue; + + uint_t nIn; + std::vector vIdx; + std::vector vRnd; + + //find rnds in this chunk + nIn = 0; + for(j=0;j= chunkSum[i] + globalSum && rnds[j] < chunkSum[i+1] + globalSum){ + vRnd.push_back(rnds[j] - (globalSum + chunkSum[i])); + vIdx.push_back(j); + nIn++; } + } - if(nIn > 0){ - auto chunkSamples = BaseState::qregs_[i].sample_measure(vRnd); - uint_t irow; - irow = (BaseState::global_chunk_index_ + i) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_)/2); + if(nIn > 0){ + auto chunkSamples = BaseState::qregs_[i].sample_measure(vRnd); + uint_t irow; + irow = (BaseState::global_chunk_index_ + i) >> ((BaseState::num_qubits_ - BaseState::chunk_bits_)); - for(j=0;j State::sample_measure(const reg_t &qubits, std::vector all_samples; all_samples.reserve(shots); for (int_t val : allbit_samples) { - reg_t allbit_sample = Utils::int2reg(val, 2, BaseState::num_qubits_/2); + reg_t allbit_sample = Utils::int2reg(val, 2, BaseState::num_qubits_); reg_t sample; sample.reserve(qubits.size()); for (uint_t qubit : qubits) { @@ -1291,7 +1438,7 @@ void State::measure_reset_update(const reg_t &qubits, template void State::apply_kraus(const reg_t &qubits, - const std::vector &kmats) + const std::vector &kmats) { int_t i; // Convert to Superoperator diff --git a/src/simulators/density_matrix/densitymatrix_thrust.hpp b/src/simulators/density_matrix/densitymatrix_thrust.hpp index 850df29dc8..3767649b39 100755 --- a/src/simulators/density_matrix/densitymatrix_thrust.hpp +++ b/src/simulators/density_matrix/densitymatrix_thrust.hpp @@ -143,6 +143,7 @@ class DensityMatrixThrust : public UnitaryMatrixThrust { // Return the expectation value of an N-qubit Pauli matrix. // The Pauli is input as a length N string of I,X,Y,Z characters. double expval_pauli(const reg_t &qubits, const std::string &pauli,const complex_t initial_phase=1.0) const; + double expval_pauli_non_diagonal_chunk(const reg_t &qubits, const std::string &pauli,const complex_t initial_phase=1.0) const; protected: // Construct a vectorized superoperator from a vectorized matrix @@ -888,6 +889,68 @@ double DensityMatrixThrust::expval_pauli(const reg_t &qubits, expval_pauli_XYZ_func_dm(x_mask, z_mask, x_max, phase, BaseMatrix::rows_) ); } +template +class expval_pauli_XYZ_func_dm_non_diagonal : public GateFuncBase +{ +protected: + uint_t x_mask_; + uint_t z_mask_; + thrust::complex phase_; + uint_t rows_; +public: + expval_pauli_XYZ_func_dm_non_diagonal(uint_t x,uint_t z,uint_t x_max,std::complex p,uint_t stride) + { + rows_ = stride; + x_mask_ = x; + z_mask_ = z; + phase_ = p; + } + + uint_t size(int num_qubits) + { + return rows_; + } + + __host__ __device__ double operator()(const uint_t &i) const + { + thrust::complex* vec; + thrust::complex q0; + double ret = 0.0; + uint_t idx_mat; + + vec = this->data_; + + idx_mat = i ^ x_mask_ + rows_ * i; + + q0 = vec[idx_mat]; + q0 = phase_ * q0; + ret = q0.real(); + if(z_mask_ != 0){ + if(pop_count_kernel(i & z_mask_) & 1) + ret = -ret; + } + return ret; + } + const char* name(void) + { + return "expval_pauli_XYZ"; + } +}; + +template +double DensityMatrixThrust::expval_pauli_non_diagonal_chunk(const reg_t &qubits, + const std::string &pauli,const complex_t initial_phase) const +{ + uint_t x_mask, z_mask, num_y, x_max; + std::tie(x_mask, z_mask, num_y, x_max) = pauli_masks_and_phase(qubits, pauli); + + // Compute the overall phase of the operator. + // This is (-1j) ** number of Y terms modulo 4 + auto phase = std::complex(initial_phase); + add_y_phase(num_y, phase); + return BaseVector::apply_function_sum( + expval_pauli_XYZ_func_dm_non_diagonal(x_mask, z_mask, x_max, phase, BaseMatrix::rows_) ); +} //----------------------------------------------------------------------- // Z-measurement outcome probabilities //----------------------------------------------------------------------- diff --git a/src/simulators/state.hpp b/src/simulators/state.hpp index 3b9c5fe10e..f7485e56df 100644 --- a/src/simulators/state.hpp +++ b/src/simulators/state.hpp @@ -128,7 +128,7 @@ class State { const = 0; //memory allocation (previously called before inisitalize_qreg) - virtual void allocate(uint_t num_qubits) {} + virtual void allocate(uint_t num_qubits,uint_t block_bits) {} // Return the expectation value of a N-qubit Pauli operator // If the simulator does not support Pauli expectation value this should diff --git a/src/simulators/state_chunk.hpp b/src/simulators/state_chunk.hpp index b8ace7a198..61674a7610 100644 --- a/src/simulators/state_chunk.hpp +++ b/src/simulators/state_chunk.hpp @@ -118,7 +118,7 @@ class StateChunk { bool final_ops = false); //memory allocation (previously called before inisitalize_qreg) - virtual void allocate(uint_t num_qubits); + virtual void allocate(uint_t num_qubits,uint_t block_bits); // Initializes the State to the default state. // Typically this is the n-qubit all |0> state @@ -319,6 +319,11 @@ class StateChunk { void send_chunk(uint_t local_chunk_index, uint_t global_chunk_index); void recv_chunk(uint_t local_chunk_index, uint_t global_chunk_index); + template + void send_data(data_t* pSend, uint_t size, uint_t myid,uint_t pairid); + template + void recv_data(data_t* pRecv, uint_t size, uint_t myid,uint_t pairid); + //reduce values over processes void reduce_sum(rvector_t& sum) const; void reduce_sum(complex_t& sum) const; @@ -433,13 +438,14 @@ void StateChunk::set_distribution(uint_t nprocs) } template -void StateChunk::allocate(uint_t num_qubits) +void StateChunk::allocate(uint_t num_qubits,uint_t block_bits) { int_t i; uint_t nchunks; int max_bits = num_qubits; num_qubits_ = num_qubits; + block_bits_ = block_bits; if(block_bits_ > 0){ chunk_bits_ = block_bits_; @@ -451,11 +457,7 @@ void StateChunk::allocate(uint_t num_qubits) chunk_bits_ = num_qubits_; } - //scale for density/unitary matrix simulators - chunk_bits_ *= qubit_scale(); - num_qubits_ *= qubit_scale(); - - num_global_chunks_ = 1ull << (num_qubits_ - chunk_bits_); + num_global_chunks_ = 1ull << ((num_qubits_ - chunk_bits_)*qubit_scale()); chunk_index_begin_.resize(distributed_procs_); chunk_index_end_.resize(distributed_procs_); @@ -469,8 +471,8 @@ void StateChunk::allocate(uint_t num_qubits) qregs_.resize(num_local_chunks_); - chunk_omp_parallel_ = false; gpu_optimization_ = false; + chunk_omp_parallel_ = false; if(qregs_[0].name().find("gpu") != std::string::npos){ if(chunk_bits_ < num_qubits_){ chunk_omp_parallel_ = true; //CUDA backend requires thread parallelization of chunk loop @@ -481,7 +483,7 @@ void StateChunk::allocate(uint_t num_qubits) nchunks = num_local_chunks_; for(i=0;i::block_diagonal_matrix(const int_t iChunk, reg_t &qubit cvector_t diag_in; for(i=0;i> (qubits[i] - chunk_bits_/qubit_scale())) & 1) + if((gid >> (qubits[i] - chunk_bits_)) & 1) mask_id |= (1ull << i); } } @@ -860,7 +862,7 @@ void StateChunk::apply_chunk_swap(const reg_t &qubits) q1 = t; } - if(q1 < chunk_bits_){ + if(q1 < chunk_bits_*qubit_scale()){ //device #pragma omp parallel for if(chunk_omp_parallel_) private(iChunk) for(iChunk=0;iChunk::apply_chunk_swap(const reg_t &qubits) uint_t nPair,mask0,mask1; uint_t baseChunk,iChunk1,iChunk2; - if(q0 < chunk_bits_) + if(q0 < chunk_bits_*qubit_scale()) nLarge = 1; else nLarge = 2; mask0 = (1ull << q0); mask1 = (1ull << q1); - mask0 >>= chunk_bits_; - mask1 >>= chunk_bits_; + mask0 >>= (chunk_bits_*qubit_scale()); + mask1 >>= (chunk_bits_*qubit_scale()); int proc_bits = 0; uint_t procs = distributed_procs_; @@ -893,8 +895,8 @@ void StateChunk::apply_chunk_swap(const reg_t &qubits) procs >>= 1; } - if(distributed_procs_ == 1 || (proc_bits >= 0 && q1 < (num_qubits_ - proc_bits))){ //no data transfer between processes is needed - if(q0 < chunk_bits_){ + if(distributed_procs_ == 1 || (proc_bits >= 0 && q1 < (num_qubits_*qubit_scale() - proc_bits))){ //no data transfer between processes is needed + if(q0 < chunk_bits_*qubit_scale()){ nPair = num_local_chunks_ >> 1; } else{ @@ -903,7 +905,7 @@ void StateChunk::apply_chunk_swap(const reg_t &qubits) #pragma omp parallel for if(chunk_omp_parallel_) private(iPair,baseChunk,iChunk1,iChunk2) for(iPair=0;iPair::apply_chunk_swap(const reg_t &qubits) uint_t iLocalChunk,iRemoteChunk,iProc; int i; - if(q0 < chunk_bits_){ + if(q0 < chunk_bits_*qubit_scale()){ nLarge = 1; - nu[0] = 1ull << (q1 - chunk_bits_); + nu[0] = 1ull << (q1 - chunk_bits_*qubit_scale()); ub[0] = 0; iu[0] = 0; - nu[1] = 1ull << (num_qubits_ - q1 - 1); - ub[1] = (q1 - chunk_bits_) + 1; + nu[1] = 1ull << (num_qubits_*qubit_scale() - q1 - 1); + ub[1] = (q1 - chunk_bits_*qubit_scale()) + 1; iu[1] = 0; } else{ nLarge = 2; - nu[0] = 1ull << (q0 - chunk_bits_); + nu[0] = 1ull << (q0 - chunk_bits_*qubit_scale()); ub[0] = 0; iu[0] = 0; nu[1] = 1ull << (q1 - q0 - 1); - ub[1] = (q0 - chunk_bits_) + 1; + ub[1] = (q0 - chunk_bits_*qubit_scale()) + 1; iu[1] = 0; - nu[2] = 1ull << (num_qubits_ - q1 - 1); - ub[2] = (q1 - chunk_bits_) + 1; + nu[2] = 1ull << (num_qubits_*qubit_scale() - q1 - 1); + ub[2] = (q1 - chunk_bits_*qubit_scale()) + 1; iu[2] = 0; } - nPair = 1ull << (num_qubits_ - chunk_bits_ - nLarge); + nPair = 1ull << (num_qubits_*qubit_scale() - chunk_bits_*qubit_scale() - nLarge); for(iPair=0;iPair::apply_chunk_swap(const reg_t &qubits) template -void StateChunk::send_chunk(uint_t local_chunk_index, uint_t global_chunk_index) +void StateChunk::send_chunk(uint_t local_chunk_index, uint_t global_pair_index) { #ifdef AER_MPI MPI_Request reqSend; @@ -1029,17 +1031,17 @@ void StateChunk::send_chunk(uint_t local_chunk_index, uint_t global_chu uint_t sizeSend; uint_t iProc; - iProc = get_process_by_chunk(global_chunk_index); + iProc = get_process_by_chunk(global_pair_index); auto pSend = qregs_[local_chunk_index].send_buffer(sizeSend); - MPI_Isend(pSend,sizeSend,MPI_BYTE,iProc,0,distributed_comm_,&reqSend); + MPI_Isend(pSend,sizeSend,MPI_BYTE,iProc,local_chunk_index + global_chunk_index_,distributed_comm_,&reqSend); MPI_Wait(&reqSend,&st); #endif } template -void StateChunk::recv_chunk(uint_t local_chunk_index, uint_t global_chunk_index) +void StateChunk::recv_chunk(uint_t local_chunk_index, uint_t global_pair_index) { #ifdef AER_MPI MPI_Request reqRecv; @@ -1047,10 +1049,44 @@ void StateChunk::recv_chunk(uint_t local_chunk_index, uint_t global_chu uint_t sizeRecv; uint_t iProc; - iProc = get_process_by_chunk(global_chunk_index); + iProc = get_process_by_chunk(global_pair_index); auto pRecv = qregs_[local_chunk_index].recv_buffer(sizeRecv); - MPI_Irecv(pRecv,sizeRecv,MPI_BYTE,iProc,0,distributed_comm_,&reqRecv); + MPI_Irecv(pRecv,sizeRecv,MPI_BYTE,iProc,global_pair_index,distributed_comm_,&reqRecv); + + MPI_Wait(&reqRecv,&st); +#endif +} + +template +template +void StateChunk::send_data(data_t* pSend, uint_t size, uint_t myid,uint_t pairid) +{ +#ifdef AER_MPI + MPI_Request reqSend; + MPI_Status st; + uint_t iProc; + + iProc = get_process_by_chunk(pairid); + + MPI_Isend(pSend,size*sizeof(data_t),MPI_BYTE,iProc,myid,distributed_comm_,&reqSend); + + MPI_Wait(&reqSend,&st); +#endif +} + +template +template +void StateChunk::recv_data(data_t* pRecv, uint_t size, uint_t myid,uint_t pairid) +{ +#ifdef AER_MPI + MPI_Request reqRecv; + MPI_Status st; + uint_t iProc; + + iProc = get_process_by_chunk(pairid); + + MPI_Irecv(pRecv,size*sizeof(data_t),MPI_BYTE,iProc,pairid,distributed_comm_,&reqRecv); MPI_Wait(&reqRecv,&st); #endif diff --git a/src/simulators/statevector/chunk/chunk.hpp b/src/simulators/statevector/chunk/chunk.hpp index dc9c4c0894..e56446e58e 100644 --- a/src/simulators/statevector/chunk/chunk.hpp +++ b/src/simulators/statevector/chunk/chunk.hpp @@ -51,6 +51,8 @@ class Chunk } ~Chunk() { + if(cache_) + cache_.reset(); } void set_device(void) const diff --git a/src/simulators/statevector/chunk/chunk_container.hpp b/src/simulators/statevector/chunk/chunk_container.hpp index e90f4592c8..b024313ad2 100644 --- a/src/simulators/statevector/chunk/chunk_container.hpp +++ b/src/simulators/statevector/chunk/chunk_container.hpp @@ -517,7 +517,6 @@ template void ChunkContainer::UnmapChunk(std::shared_ptr> chunk) { chunk->unmap(); -// chunk.reset(); } template @@ -546,7 +545,6 @@ void ChunkContainer::UnmapBuffer(std::shared_ptr> buf) #pragma omp critical { buf->unmap(); -// buf.reset(); } } @@ -585,7 +583,6 @@ void ChunkContainer::UnmapCheckpoint(std::shared_ptr> buf) #pragma omp critical { buf->unmap(); -// buf.reset(); } } } diff --git a/src/simulators/statevector/chunk/device_chunk_container.hpp b/src/simulators/statevector/chunk/device_chunk_container.hpp index 42b8e78892..8fe2ba9250 100644 --- a/src/simulators/statevector/chunk/device_chunk_container.hpp +++ b/src/simulators/statevector/chunk/device_chunk_container.hpp @@ -33,7 +33,7 @@ class DeviceChunkContainer : public ChunkContainer protected: AERDeviceVector> data_; //device vector to chunks and buffers AERDeviceVector> matrix_; //storage for large matrix - mutable AERDeviceVector params_; //storage for additional parameters + mutable AERDeviceVector params_; //storage for additional parameters AERDeviceVector reduce_buffer_; //buffer for reduction int device_id_; //device index std::vector peer_access_; //to which device accepts peer access @@ -349,6 +349,8 @@ uint_t DeviceChunkContainer::Resize(uint_t chunks,uint_t buffers,uint_t template void DeviceChunkContainer::Deallocate(void) { + set_device(); + data_.clear(); data_.shrink_to_fit(); matrix_.clear(); @@ -371,7 +373,6 @@ void DeviceChunkContainer::Deallocate(void) } stream_.clear(); #endif - } template diff --git a/src/simulators/statevector/chunk/host_chunk_container.hpp b/src/simulators/statevector/chunk/host_chunk_container.hpp index b4b9fb8d96..a6b32d1375 100644 --- a/src/simulators/statevector/chunk/host_chunk_container.hpp +++ b/src/simulators/statevector/chunk/host_chunk_container.hpp @@ -166,8 +166,11 @@ template void HostChunkContainer::Deallocate(void) { data_.clear(); + data_.shrink_to_fit(); matrix_.clear(); + matrix_.shrink_to_fit(); params_.clear(); + params_.shrink_to_fit(); } diff --git a/src/simulators/statevector/qubitvector_thrust.hpp b/src/simulators/statevector/qubitvector_thrust.hpp index c68e7b1ce6..a3155f2c84 100644 --- a/src/simulators/statevector/qubitvector_thrust.hpp +++ b/src/simulators/statevector/qubitvector_thrust.hpp @@ -956,15 +956,11 @@ bool QubitVectorThrust::fetch_chunk(void) const int tid,nid; int idev; - tid = omp_get_thread_num(); - nid = omp_get_num_threads(); - - idev = tid * chunk_manager_.num_devices() / nid; - if(chunk_->device() < 0){ //on host + idev = 0; do{ - buffer_chunk_ = chunk_manager_.MapBufferChunk(idev); + buffer_chunk_ = chunk_manager_.MapBufferChunk(idev++ % chunk_manager_.num_devices()); }while(!buffer_chunk_); chunk_->set_cache(buffer_chunk_); buffer_chunk_->CopyIn(chunk_); @@ -2587,7 +2583,7 @@ void QubitVectorThrust::apply_chunk_swap(const reg_t &qubits, QubitVecto else{ thrust::complex* pChunk0; thrust::complex* pChunk1; - std::shared_ptr> pBuffer0; + std::shared_ptr> pBuffer0 = nullptr; std::shared_ptr> pExec; if(chunk_->device() >= 0){ diff --git a/src/simulators/statevector/statevector_state.hpp b/src/simulators/statevector/statevector_state.hpp index e7364fc7d3..24da46f3ce 100755 --- a/src/simulators/statevector/statevector_state.hpp +++ b/src/simulators/statevector/statevector_state.hpp @@ -149,7 +149,7 @@ class State : public Base::State { virtual std::vector sample_measure(const reg_t &qubits, uint_t shots, RngEngine &rng) override; - virtual void allocate(uint_t num_qubits) override; + virtual void allocate(uint_t num_qubits,uint_t block_bits) override; //----------------------------------------------------------------------- // Additional methods @@ -437,7 +437,7 @@ const stringmap_t State::snapshotset_( // Initialization //------------------------------------------------------------------------- template -void State::allocate(uint_t num_qubits) +void State::allocate(uint_t num_qubits,uint_t block_bits) { BaseState::qreg_.chunk_setup(num_qubits,num_qubits,0,1); } diff --git a/src/simulators/statevector/statevector_state_chunk.hpp b/src/simulators/statevector/statevector_state_chunk.hpp index 5736a937fd..2ba9ce0855 100644 --- a/src/simulators/statevector/statevector_state_chunk.hpp +++ b/src/simulators/statevector/statevector_state_chunk.hpp @@ -33,16 +33,24 @@ namespace AER { namespace StatevectorChunk { +using OpType = Operations::OpType; + +// OpSet of supported instructions const Operations::OpSet StateOpSet( // Op types - {Operations::OpType::gate, Operations::OpType::measure, - Operations::OpType::reset, Operations::OpType::initialize, - Operations::OpType::snapshot, Operations::OpType::barrier, - Operations::OpType::bfunc, Operations::OpType::roerror, - Operations::OpType::matrix, Operations::OpType::diagonal_matrix, - Operations::OpType::multiplexer, Operations::OpType::kraus, - Operations::OpType::sim_op, Operations::OpType::save_expval, - Operations::OpType::save_expval_var}, + {OpType::gate, OpType::measure, + OpType::reset, OpType::initialize, + OpType::snapshot, OpType::barrier, + OpType::bfunc, OpType::roerror, + OpType::matrix, OpType::diagonal_matrix, + OpType::multiplexer, OpType::kraus, + OpType::sim_op, OpType::save_expval, + OpType::save_expval_var, OpType::save_densmat, + OpType::save_probs, OpType::save_probs_ket, + OpType::save_amps, OpType::save_amps_sq, + OpType::save_statevec + // OpType::save_statevec_ket // TODO + }, // Gates {"u1", "u2", "u3", "u", "U", "CX", "cx", "cz", "cy", "cp", "cu1", "cu2", "cu3", "swap", "id", "p", @@ -52,19 +60,13 @@ const Operations::OpSet StateOpSet( "mcswap", "mcphase", "mcr", "mcrx", "mcry", "mcry", "sx", "csx", "mcsx", "delay", "pauli", "mcx_gray"}, // Snapshots - {"memory", "register", "probabilities", + {"statevector", "memory", "register", "probabilities", "probabilities_with_variance", "expectation_value_pauli", "density_matrix", + "density_matrix_with_variance", "expectation_value_pauli_with_variance", "expectation_value_matrix_single_shot", "expectation_value_matrix", "expectation_value_matrix_with_variance", "expectation_value_pauli_single_shot"}); -// Allowed gates enum class -enum class Gates { - id, h, s, sdg, t, tdg, - rxx, ryy, rzz, rzx, - mcx, mcy, mcz, mcr, mcrx, mcry, - mcrz, mcp, mcu2, mcu3, mcswap, mcsx, pauli -}; //========================================================================= // QubitVector State subclass @@ -119,6 +121,7 @@ class State : public Base::StateChunk { void initialize_omp(); auto move_to_vector(); + auto copy_to_vector(); protected: @@ -185,6 +188,30 @@ class State : public Base::StateChunk { // Save data instructions //----------------------------------------------------------------------- + // Save the current state of the statevector simulator + // If `last_op` is True this will use move semantics to move the simulator + // state to the results, otherwise it will use copy semantics to leave + // the current simulator state unchanged. + void apply_save_statevector(const Operations::Op &op, + ExperimentResult &result, + bool last_op); + + // Save the current state of the statevector simulator as a ket-form map. + void apply_save_statevector_ket(const Operations::Op &op, + ExperimentResult &result); + + // Save the current density matrix or reduced density matrix + void apply_save_density_matrix(const Operations::Op &op, + ExperimentResult &result); + + // Helper function for computing expectation value + void apply_save_probs(const Operations::Op &op, + ExperimentResult &result); + + // Helper function for saving amplitudes and amplitudes squared + void apply_save_amplitudes(const Operations::Op &op, + ExperimentResult &result); + // Helper function for computing expectation value virtual double expval_pauli(const reg_t &qubits, const std::string& pauli) override; @@ -480,6 +507,35 @@ auto State::move_to_vector() } } +template +auto State::copy_to_vector() +{ + if(BaseState::num_global_chunks_ == 1){ + return BaseState::qregs_[0].copy_to_vector(); + } + else{ + int_t iChunk; + auto state = BaseState::qregs_[0].copy_to_vector(); + + //TO DO check memory availability + state.resize(BaseState::num_local_chunks_ << BaseState::chunk_bits_); + +#pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) + for(iChunk=1;iChunk::apply_op(const int_t iChunk,const Operations::Op &op, case Operations::OpType::save_expval_var: BaseState::apply_save_expval(op, result); break; + case Operations::OpType::save_densmat: + apply_save_density_matrix(op, result); + break; + case Operations::OpType::save_statevec: + apply_save_statevector(op, result, final_ops); + break; + // case Operations::OpType::save_statevec_ket: + // apply_save_statevector_ket(op, result); + // break; + case Operations::OpType::save_probs: + case Operations::OpType::save_probs_ket: + apply_save_probs(op, result); + break; + case Operations::OpType::save_amps: + case Operations::OpType::save_amps_sq: + apply_save_amplitudes(op, result); + break; default: throw std::invalid_argument("QubitVector::State::invalid instruction \'" + op.name + "\'."); @@ -546,6 +619,22 @@ void State::apply_op(const int_t iChunk,const Operations::Op &op, // Implementation: Save data //========================================================================= +template +void State::apply_save_probs(const Operations::Op &op, + ExperimentResult &result) { + // get probs as hexadecimal + auto probs = measure_probs(op.qubits); + if (op.type == Operations::OpType::save_probs_ket) { + // Convert to ket dict + BaseState::save_data_average(result, op.string_params[0], + Utils::vec2ket(probs, json_chop_threshold_, 16), + op.save_type); + } else { + BaseState::save_data_average(result, op.string_params[0], + std::move(probs), op.save_type); + } +} + template double State::expval_pauli(const reg_t &qubits, const std::string& pauli) @@ -585,7 +674,7 @@ double State::expval_pauli(const reg_t &qubits, bool on_same_process = true; #ifdef AER_MPI int proc_bits = 0; - uint_t procs = distributed_procs_; + uint_t procs = BaseState::distributed_procs_; while(procs > 1){ if((procs & 1) != 0){ proc_bits = -1; @@ -618,12 +707,12 @@ double State::expval_pauli(const reg_t &qubits, z_count_pair = AER::Utils::popcount(pair_chunk & z_mask); if(iProc == BaseState::distributed_rank_){ //pair is on the same process - expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,BaseState::qregs_[pair_chunk - BaseState::global_chunk_index_],z_count,z_count_pair); + expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,BaseState::qregs_[pair_chunk - BaseState::global_chunk_index_],z_count,z_count_pair,phase); } else{ BaseState::recv_chunk(iChunk-BaseState::global_chunk_index_,pair_chunk); //refer receive buffer to calculate expectation value - expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,BaseState::qregs_[iChunk-BaseState::global_chunk_index_],z_count,z_count_pair); + expval += BaseState::qregs_[iChunk-BaseState::global_chunk_index_].expval_pauli(qubits_in_chunk, pauli_in_chunk,BaseState::qregs_[iChunk-BaseState::global_chunk_index_],z_count,z_count_pair,phase); } } else if(iProc == BaseState::distributed_rank_){ //pair is on this process @@ -655,6 +744,111 @@ double State::expval_pauli(const reg_t &qubits, return expval; } +template +void State::apply_save_statevector(const Operations::Op &op, + ExperimentResult &result, + bool last_op) +{ + if (op.qubits.size() != BaseState::num_qubits_) { + throw std::invalid_argument( + op.name + " was not applied to all qubits." + " Only the full statevector can be saved."); + } + if (last_op) { + BaseState::save_data_pershot(result, op.string_params[0], + move_to_vector(), + op.save_type); + } else { + BaseState::save_data_pershot(result, op.string_params[0], + copy_to_vector(), + op.save_type); + } +} + +template +void State::apply_save_statevector_ket(const Operations::Op &op, + ExperimentResult &result) +{ + if (op.qubits.size() != BaseState::num_qubits_) { + throw std::invalid_argument( + op.name + " was not applied to all qubits." + " Only the full statevector can be saved."); + } + // TODO: compute state ket + std::map state_ket; + + BaseState::save_data_pershot(result, op.string_params[0], + std::move(state_ket), op.save_type); +} + +template +void State::apply_save_density_matrix(const Operations::Op &op, + ExperimentResult &result) +{ + cmatrix_t reduced_state; + + // Check if tracing over all qubits + if (op.qubits.empty()) { + reduced_state = cmatrix_t(1, 1); + + double sum = 0.0; +#pragma omp parallel for if(BaseState::chunk_omp_parallel_) reduction(+:sum) + for(int_t i=0;i +void State::apply_save_amplitudes(const Operations::Op &op, + ExperimentResult &result) +{ + if (op.int_params.empty()) { + throw std::invalid_argument("Invalid save_amplitudes instructions (empty params)."); + } + const int_t size = op.int_params.size(); + if (op.type == Operations::OpType::save_amps) { + Vector amps(size, false); + for (int_t i = 0; i < size; ++i) { + uint_t iChunk = op.int_params[i] >> BaseState::chunk_bits_; + amps[i] = 0.0; + if(iChunk >= BaseState::global_chunk_index_ && iChunk < BaseState::global_chunk_index_ + BaseState::num_local_chunks_){ + amps[i] = BaseState::qregs_[iChunk - BaseState::global_chunk_index_].get_state(op.int_params[i] - (iChunk << BaseState::chunk_bits_)); + } +#ifdef AER_MPI + complex_t amp = amps[i]; + BaseState::reduce_sum(amp); + amps[i] = amp; +#endif + } + BaseState::save_data_pershot(result, op.string_params[0], + std::move(amps), op.save_type); + } + else{ + rvector_t amps_sq(size,0); + for (int_t i = 0; i < size; ++i) { + uint_t iChunk = op.int_params[i] >> BaseState::chunk_bits_; + if(iChunk >= BaseState::global_chunk_index_ && iChunk < BaseState::global_chunk_index_ + BaseState::num_local_chunks_){ + amps_sq[i] = BaseState::qregs_[iChunk - BaseState::global_chunk_index_].probability(op.int_params[i] - (iChunk << BaseState::chunk_bits_)); + } + } +#ifdef AER_MPI + BaseState::reduce_sum(amps_sq); +#endif + BaseState::save_data_average(result, op.string_params[0], + std::move(amps_sq), op.save_type); + } +} + //========================================================================= // Implementation: Snapshots //========================================================================= @@ -926,7 +1120,7 @@ cmatrix_t State::vec2density(const reg_t &qubits, const T &vec) { // Return full density matrix cmatrix_t densmat(DIM, DIM); - if ((N == BaseState::qregs_[0].num_qubits()) && (qubits == qubits_sorted)) { + if ((N == BaseState::num_qubits_) && (qubits == qubits_sorted)) { const int_t mask = QV::MASKS[N]; #pragma omp parallel for if (2 * N > omp_qubit_threshold_ && \ BaseState::threads_ > 1) \ @@ -937,7 +1131,7 @@ cmatrix_t State::vec2density(const reg_t &qubits, const T &vec) { densmat(row, col) = complex_t(vec[row]) * complex_t(std::conj(vec[col])); } } else { - const size_t END = 1ULL << (BaseState::qregs_[0].num_qubits() - N); + const size_t END = 1ULL << (BaseState::num_qubits_ - N); // Initialize matrix values with first block { const auto inds = QV::indexes(qubits, qubits_sorted, 0); diff --git a/src/simulators/unitary/unitary_state.hpp b/src/simulators/unitary/unitary_state.hpp index 3b63562721..17bdd91c4b 100755 --- a/src/simulators/unitary/unitary_state.hpp +++ b/src/simulators/unitary/unitary_state.hpp @@ -104,7 +104,7 @@ class State : public Base::State { // Config: {"omp_qubit_threshold": 7} virtual void set_config(const json_t &config) override; - virtual void allocate(uint_t num_qubits) override; + virtual void allocate(uint_t num_qubits,uint_t block_bits) override; //----------------------------------------------------------------------- // Additional methods @@ -256,7 +256,7 @@ const stringmap_t State::gateset_({ }); template -void State::allocate(uint_t num_qubits) +void State::allocate(uint_t num_qubits,uint_t block_bits) { BaseState::qreg_.chunk_setup(num_qubits*2,num_qubits*2,0,1); } diff --git a/src/simulators/unitary/unitary_state_chunk.hpp b/src/simulators/unitary/unitary_state_chunk.hpp index d98f0cac35..a0276cc7d1 100644 --- a/src/simulators/unitary/unitary_state_chunk.hpp +++ b/src/simulators/unitary/unitary_state_chunk.hpp @@ -27,8 +27,6 @@ #include "unitarymatrix_thrust.hpp" #endif -//#include "unitary_state.hpp" - namespace AER { namespace QubitUnitaryChunk { @@ -36,7 +34,8 @@ namespace QubitUnitaryChunk { const Operations::OpSet StateOpSet( // Op types {Operations::OpType::gate, Operations::OpType::barrier, - Operations::OpType::matrix, Operations::OpType::diagonal_matrix}, + Operations::OpType::matrix, Operations::OpType::diagonal_matrix, + Operations::OpType::snapshot, Operations::OpType::save_unitary}, // Gates {"u1", "u2", "u3", "u", "U", "CX", "cx", "cz", "cy", "cp", "cu1", "cu2", "cu3", "swap", "id", "p", @@ -46,13 +45,7 @@ const Operations::OpSet StateOpSet( "mcswap", "mcphase", "mcr", "mcrx", "mcry", "mcry", "sx", "csx", "mcsx", "delay", "pauli"}, // Snapshots - {}); - -// Allowed gates enum class -enum class Gates { - id, h, s, sdg, t, tdg, rxx, ryy, rzz, rzx, - mcx, mcy, mcz, mcr, mcrx, mcry, mcrz, mcp, mcu2, mcu3, mcswap, mcsx, pauli, -}; + {"unitary"}); //========================================================================= // QubitUnitary State subclass @@ -128,6 +121,9 @@ class State : public Base::StateChunk { // Apply a matrix to given qubits (identity on all other qubits) void apply_matrix(const uint_t iChunk,const reg_t &qubits, const cvector_t &vmat); + // Apply a diagonal matrix + void apply_diagonal_matrix(const uint_t iChunk,const reg_t &qubits, const cvector_t &diag); + //----------------------------------------------------------------------- // 1-Qubit Gates //----------------------------------------------------------------------- @@ -197,7 +193,7 @@ void State::apply_op(const int_t iChunk,const Operations::Op & apply_matrix(iChunk,op.qubits, op.mats[0]); break; case Operations::OpType::diagonal_matrix: - BaseState::qregs_[iChunk].apply_diagonal_matrix(op.qubits, op.params); + apply_diagonal_matrix(iChunk,op.qubits, op.params); break; default: throw std::invalid_argument( @@ -240,7 +236,7 @@ void State::initialize_qreg(uint_t num_qubits) if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(i=0;i::initialize_qreg(uint_t num_qubits) else{ //multi-chunk distribution #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i) for(i=0;inum_qubits_ == this->chunk_bits_){ BaseState::qregs_[i].initialize(); } @@ -278,19 +274,19 @@ void State::initialize_qreg(uint_t num_qubits, int_t iChunk; if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(iChunk=0;iChunk::initialize_qreg(uint_t num_qubits, int_t iChunk; if(BaseState::chunk_bits_ == BaseState::num_qubits_){ for(iChunk=0;iChunk::move_to_matrix() if(BaseState::num_global_chunks_ == 1){ return BaseState::qregs_[0].move_to_matrix(); } - else{ - int_t iChunk; - auto state = BaseState::qregs_[0].vector(); //using vector to gather distributed matrix + int_t iChunk; + uint_t size = 1ull << (BaseState::chunk_bits_*2); + uint_t mask = (1ull << (BaseState::chunk_bits_)) - 1; + uint_t num_threads = BaseState::qregs_[0].get_omp_threads(); + + auto matrix = BaseState::qregs_[0].copy_to_matrix(); + if(BaseState::distributed_rank_ == 0){ //TO DO check memory availability - state.resize(BaseState::num_local_chunks_ << BaseState::chunk_bits_); - -#pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(iChunk) - for(iChunk=1;iChunk 1) num_threads(num_threads) + for(i=0;i> (BaseState::chunk_bits_); + uint_t icol = i & mask; + matrix[offset+i] = recv(icol,irow); } } +#endif + for(iChunk=0;iChunk 1) num_threads(num_threads) + for(i=0;i> (BaseState::chunk_bits_); + uint_t icol = i & mask; + matrix[offset+i] = tmp(icol,irow); + } + } + } + else{ #ifdef AER_MPI - BaseState::gather_state(state); + //send matrices to process 0 + for(iChunk=0;iChunk::apply_gate(const uint_t iChunk,const Operations::O BaseState::qregs_[iChunk].apply_matrix(op.qubits, Linalg::VMatrix::ryy(op.params[0])); break; case QubitUnitary::Gates::rzz: - BaseState::qregs_[iChunk].apply_diagonal_matrix(op.qubits, Linalg::VMatrix::rzz_diag(op.params[0])); + apply_diagonal_matrix(iChunk,op.qubits, Linalg::VMatrix::rzz_diag(op.params[0])); break; case QubitUnitary::Gates::rzx: BaseState::qregs_[iChunk].apply_matrix(op.qubits, Linalg::VMatrix::rzx(op.params[0])); @@ -497,12 +521,28 @@ void State::apply_matrix(const uint_t iChunk,const reg_t &qubi const cvector_t &vmat) { // Check if diagonal matrix if (vmat.size() == 1ULL << qubits.size()) { - BaseState::qregs_[iChunk].apply_diagonal_matrix(qubits, vmat); + apply_diagonal_matrix(iChunk,qubits, vmat); } else { BaseState::qregs_[iChunk].apply_matrix(qubits, vmat); } } +template +void State::apply_diagonal_matrix(const uint_t iChunk, const reg_t &qubits, const cvector_t &diag) +{ + if(BaseState::gpu_optimization_){ + //GPU computes all chunks in one kernel, so pass qubits and diagonal matrix as is + BaseState::qregs_[iChunk].apply_diagonal_matrix(qubits,diag); + } + else{ + reg_t qubits_in = qubits; + cvector_t diag_in = diag; + + BaseState::block_diagonal_matrix(iChunk,qubits_in,diag_in); + BaseState::qregs_[iChunk].apply_diagonal_matrix(qubits_in,diag_in); + } +} + template void State::apply_gate_phase(const uint_t iChunk,uint_t qubit, complex_t phase) { cmatrix_t diag(1, 2); @@ -540,8 +580,7 @@ void State::apply_global_phase() { int_t i; #pragma omp parallel for if(BaseState::chunk_omp_parallel_) private(i) for(i=0;i Date: Thu, 11 Mar 2021 00:05:51 +0900 Subject: [PATCH 7/7] Add Fusion variations (#1110) Co-authored-by: Victor Villar --- src/framework/operations.hpp | 13 + src/transpile/fusion.hpp | 1010 +++++++++++++---- .../backends/qasm_simulator/qasm_fusion.py | 80 +- 3 files changed, 863 insertions(+), 240 deletions(-) diff --git a/src/framework/operations.hpp b/src/framework/operations.hpp index 01bddc5608..0c22520ceb 100755 --- a/src/framework/operations.hpp +++ b/src/framework/operations.hpp @@ -283,6 +283,19 @@ inline Op make_unitary(const reg_t &qubits, cmatrix_t &&mat, std::string label = return op; } +inline Op make_diagonal(const reg_t &qubits, cvector_t &&vec, std::string label = "") { + Op op; + op.type = OpType::diagonal_matrix; + op.name = "diagonal"; + op.qubits = qubits; + op.params = std::move(vec); + + if (label != "") + op.string_params = {label}; + + return op; +} + inline Op make_superop(const reg_t &qubits, const cmatrix_t &mat) { Op op; op.type = OpType::superop; diff --git a/src/transpile/fusion.hpp b/src/transpile/fusion.hpp index ae84c0cab8..10ae3bff03 100644 --- a/src/transpile/fusion.hpp +++ b/src/transpile/fusion.hpp @@ -32,6 +32,642 @@ using oplist_t = std::vector; using opset_t = Operations::OpSet; using reg_t = std::vector; +class FusionMethod { +public: + // Return name of method + virtual std::string name() = 0; + + virtual bool support_diagonal() const = 0; + + // Aggregate a subcircuit of operations into a single operation + virtual op_t generate_operation(std::vector& fusioned_ops, bool diagonal = false) const { + std::set fusioned_qubits; + for (auto & op: fusioned_ops) + fusioned_qubits.insert(op.qubits.begin(), op.qubits.end()); + + reg_t remapped2orig(fusioned_qubits.begin(), fusioned_qubits.end()); + std::unordered_map orig2remapped; + reg_t arg_qubits; + arg_qubits.assign(fusioned_qubits.size(), 0); + for (size_t i = 0; i < remapped2orig.size(); i++) { + orig2remapped[remapped2orig[i]] = i; + arg_qubits[i] = i; + } + + // Remap qubits + for (auto & op: fusioned_ops) + for (size_t i = 0; i < op.qubits.size(); i++) + op.qubits[i] = orig2remapped[op.qubits[i]]; + + auto fusioned_op = generate_operation_internal(fusioned_ops, arg_qubits); + + // Revert qubits + for (size_t i = 0; i < fusioned_op.qubits.size(); i++) + fusioned_op.qubits[i] = remapped2orig[fusioned_op.qubits[i]]; + + if (diagonal) { + std::vector vec; + vec.assign((1UL << fusioned_op.qubits.size()), 0); + for (size_t i = 0; i < vec.size(); ++i) + vec[i] = fusioned_op.mats[0](i, i); + fusioned_op = Operations::make_diagonal(fusioned_op.qubits, std::move(vec), std::string("fusion")); + } + + return fusioned_op; + }; + + virtual op_t generate_operation_internal(const std::vector& fusioned_ops, + const reg_t &fusioned_qubits) const = 0; + + virtual bool can_apply(const op_t& op, uint_t max_fused_qubits) const = 0; + + virtual bool can_ignore(const op_t& op) const { + switch (op.type) { + case optype_t::barrier: + return true; + case optype_t::gate: + return op.name == "id" || op.name == "u0"; + default: + return false; + } + } + + static FusionMethod& find_method(const Circuit& circ, + const opset_t &allowed_opset, + const bool allow_superop, + const bool allow_kraus); + + static bool exist_non_unitary(const std::vector& fusioned_ops) { + for (auto & op: fusioned_ops) + if (noise_opset_.contains(op.type)) + return true; + return false; + }; + +private: + const static Operations::OpSet noise_opset_; +}; + +const Operations::OpSet FusionMethod::noise_opset_( + {Operations::OpType::kraus, + Operations::OpType::superop, + Operations::OpType::reset}, + {}, {} +); + +class UnitaryFusion : public FusionMethod { +public: + virtual std::string name() override { return "unitary"; }; + + virtual bool support_diagonal() const override { return true; } + + virtual op_t generate_operation_internal (const std::vector& fusioned_ops, + const reg_t &qubits) const override { + // Run simulation + RngEngine dummy_rng; + ExperimentResult dummy_result; + + // Unitary simulation + QubitUnitary::State<> unitary_simulator; + unitary_simulator.initialize_qreg(qubits.size()); + unitary_simulator.apply_ops(fusioned_ops, dummy_result, dummy_rng); + return Operations::make_unitary(qubits, unitary_simulator.qreg().move_to_matrix(), + std::string("fusion")); + }; + + virtual bool can_apply(const op_t& op, uint_t max_fused_qubits) const { + if (op.conditional) + return false; + switch (op.type) { + case optype_t::matrix: + return op.mats.size() == 1 && op.qubits.size() <= max_fused_qubits; + case optype_t::diagonal_matrix: + return op.qubits.size() <= max_fused_qubits; + case optype_t::gate: { + if (op.qubits.size() > max_fused_qubits) + return false; + return QubitUnitary::StateOpSet.contains_gates(op.name); + } + default: + return false; + } + }; +}; + +class SuperOpFusion : public UnitaryFusion { +public: + virtual std::string name() override { return "superop"; }; + + virtual bool support_diagonal() const override { return false; } + + virtual op_t generate_operation_internal(const std::vector& fusioned_ops, + const reg_t &qubits) const override { + + if (!exist_non_unitary(fusioned_ops)) + return UnitaryFusion::generate_operation_internal(fusioned_ops, qubits); + + // Run simulation + RngEngine dummy_rng; + ExperimentResult dummy_result; + + // For both Kraus and SuperOp method we simulate using superoperator + // simulator + QubitSuperoperator::State<> superop_simulator; + superop_simulator.initialize_qreg(qubits.size()); + superop_simulator.apply_ops(fusioned_ops, dummy_result, dummy_rng); + auto superop = superop_simulator.qreg().move_to_matrix(); + + return Operations::make_superop(qubits, std::move(superop)); + }; + + virtual bool can_apply(const op_t& op, uint_t max_fused_qubits) const { + if (op.conditional) + return false; + switch (op.type) { + case optype_t::kraus: + case optype_t::reset: + case optype_t::superop: { + return op.qubits.size() <= max_fused_qubits; + } + case optype_t::gate: { + if (op.qubits.size() > max_fused_qubits) + return false; + return QubitSuperoperator::StateOpSet.contains_gates(op.name); + } + default: + return UnitaryFusion::can_apply(op, max_fused_qubits); + } + }; +}; + +class KrausFusion : public UnitaryFusion { +public: + virtual std::string name() override { return "kraus"; }; + + virtual bool support_diagonal() const override { return false; } + + virtual op_t generate_operation_internal(const std::vector& fusioned_ops, + const reg_t &qubits) const override { + + if (!exist_non_unitary(fusioned_ops)) + return UnitaryFusion::generate_operation_internal(fusioned_ops, qubits); + + // Run simulation + RngEngine dummy_rng; + ExperimentResult dummy_result; + + // For both Kraus and SuperOp method we simulate using superoperator + // simulator + QubitSuperoperator::State<> superop_simulator; + superop_simulator.initialize_qreg(qubits.size()); + superop_simulator.apply_ops(fusioned_ops, dummy_result, dummy_rng); + auto superop = superop_simulator.qreg().move_to_matrix(); + + // If Kraus method we convert superop to canonical Kraus representation + size_t dim = 1 << qubits.size(); + return Operations::make_kraus(qubits, Utils::superop2kraus(superop, dim)); + }; + + virtual bool can_apply(const op_t& op, uint_t max_fused_qubits) const { + if (op.conditional) + return false; + switch (op.type) { + case optype_t::kraus: + case optype_t::reset: + case optype_t::superop: { + return op.qubits.size() <= max_fused_qubits; + } + case optype_t::gate: { + if (op.qubits.size() > max_fused_qubits) + return false; + return QubitSuperoperator::StateOpSet.contains_gates(op.name); + } + default: + return UnitaryFusion::can_apply(op, max_fused_qubits); + } + }; +}; + +FusionMethod& FusionMethod::find_method(const Circuit& circ, + const opset_t &allowed_opset, + const bool allow_superop, + const bool allow_kraus) { + static UnitaryFusion unitary; + static SuperOpFusion superOp; + static KrausFusion kraus; + + if (allow_superop && allowed_opset.contains(optype_t::superop) && + (circ.opset().contains(optype_t::kraus) + || circ.opset().contains(optype_t::superop) + || circ.opset().contains(optype_t::reset))) { + return superOp; + } else if (allow_kraus && allowed_opset.contains(optype_t::kraus) && + (circ.opset().contains(optype_t::kraus) + || circ.opset().contains(optype_t::superop))) { + return kraus; + } else { + return unitary; + } +} + +class Fuser { +public: + virtual std::string name() const = 0; + + virtual void set_config(const json_t &config) = 0; + + virtual void set_metadata(ExperimentResult &result) const { }; //nop + + virtual bool aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const = 0; + + virtual void allocate_new_operation(oplist_t& ops, + const uint_t idx, + const std::vector& fusioned_ops_idxs, + const FusionMethod& method, + const bool diagonal = false) const; +}; + +void Fuser::allocate_new_operation(oplist_t& ops, + const uint_t idx, + const std::vector& idxs, + const FusionMethod& method, + const bool diagonal) const { + + oplist_t fusing_ops; + for (uint_t i: idxs) + fusing_ops.push_back(ops[i]); + ops[idx] = method.generate_operation(fusing_ops, diagonal); + for (auto i: idxs) + if (i != idx) + ops[i].type = optype_t::nop; +} + +class CostBasedFusion : public Fuser { +public: + CostBasedFusion() { + std::fill_n(costs, 64, -1); + }; + + virtual std::string name() const override { return "cost_base"; }; + + virtual void set_config(const json_t &config) override; + + virtual void set_metadata(ExperimentResult &result) const override; + + virtual bool aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const override; + +private: + bool is_diagonal(const oplist_t& ops, + const uint_t from, + const uint_t until) const; + + double estimate_cost(const oplist_t& ops, + const uint_t from, + const uint_t until) const; + + void add_fusion_qubits(reg_t& fusion_qubits, const op_t& op) const; + +private: + bool active = true; + double cost_factor = 1.8; + double costs[64]; +}; + +template +class NQubitFusion : public Fuser { +public: + NQubitFusion(): opt_name(std::to_string(N) + "_qubits"), + activate_prop_name("fusion_enable." + std::to_string(N) + "_qubits") { + } + + virtual void set_config(const json_t &config) override; + + virtual std::string name() const override { + return opt_name; + }; + + virtual bool aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const override; + + bool exclude_escaped_qubits(std::vector& fusing_qubits, + const op_t& tgt_op) const; +private: + bool active = true; + const std::string opt_name; + const std::string activate_prop_name; + uint_t qubit_threshold = 5; +}; + +template +void NQubitFusion::set_config(const json_t &config) { + if (JSON::check_key("fusion_enable.n_qubits", config)) + JSON::get_value(active, "fusion_enable.n_qubits", config); + + if (JSON::check_key(activate_prop_name, config)) + JSON::get_value(active, activate_prop_name, config); +} + +template +bool NQubitFusion::exclude_escaped_qubits(std::vector& fusing_qubits, + const op_t& tgt_op) const { + bool included = true; + for (const auto qubit: tgt_op.qubits) + included &= (std::find(fusing_qubits.begin(), fusing_qubits.end(), qubit) != fusing_qubits.end()); + + if (included) + return false; + + for (const int op_qubit: tgt_op.qubits) { + auto found = std::find(fusing_qubits.begin(), fusing_qubits.end(), op_qubit); + if (found != fusing_qubits.end()) + fusing_qubits.erase(found); + } + return true; +} + +template +bool NQubitFusion::aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const { + if (!active) + return false; + + std::vector>> targets; + bool fused = false; + + for (uint_t op_idx = fusion_start; op_idx < fusion_end; ++op_idx) { + // skip operations to be ignored + if (!method.can_apply(ops[op_idx], max_fused_qubits) || ops[op_idx].type == optype_t::nop) + continue; + + // 1. find a N-qubit operation + if (ops[op_idx].qubits.size() != N) + continue; + + std::vector fusing_op_idxs = { op_idx }; + + std::vector fusing_qubits; + fusing_qubits.insert(fusing_qubits.end(), ops[op_idx].qubits.begin(), ops[op_idx].qubits.end()); + + // 2. fuse operations with backwarding + for (int fusing_op_idx = op_idx - 1; fusing_op_idx >= fusion_start; --fusing_op_idx) { + auto& tgt_op = ops[fusing_op_idx]; + if (tgt_op.type == optype_t::nop) + continue; + if (!method.can_apply(tgt_op, max_fused_qubits)) + break; + // check all the qubits are in fusing_qubits + if (!exclude_escaped_qubits(fusing_qubits, tgt_op)) + fusing_op_idxs.push_back(fusing_op_idx); // All the qubits of tgt_op are in fusing_qubits + else if (fusing_qubits.empty()) + break; + } + + std::reverse(fusing_op_idxs.begin(), fusing_op_idxs.end()); + fusing_qubits.clear(); + fusing_qubits.insert(fusing_qubits.end(), ops[op_idx].qubits.begin(), ops[op_idx].qubits.end()); + + // 3. fuse operations with forwarding + for (int fusing_op_idx = op_idx + 1; fusing_op_idx < fusion_end; ++fusing_op_idx) { + auto& tgt_op = ops[fusing_op_idx]; + if (tgt_op.type == optype_t::nop) + continue; + if (!method.can_apply(tgt_op, max_fused_qubits)) + break; + // check all the qubits are in fusing_qubits + if (!exclude_escaped_qubits(fusing_qubits, tgt_op)) + fusing_op_idxs.push_back(fusing_op_idx); // All the qubits of tgt_op are in fusing_qubits + else if (fusing_qubits.empty()) + break; + } + + if (fusing_op_idxs.size() <= 1) + continue; + + // 4. generate a fused operation + allocate_new_operation(ops, op_idx, fusing_op_idxs, method, false); + + fused = true; + } + + return fused; +} + +class DiagonalFusion : public Fuser { +public: + DiagonalFusion() = default; + + virtual ~DiagonalFusion() = default; + + virtual std::string name() const override { return "diagonal"; }; + + virtual void set_config(const json_t &config) override; + + virtual bool aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const override; + +private: + bool is_diagonal_op(const op_t& op) const; + + int get_next_diagonal_end(const oplist_t& ops, const int from, std::set& fusing_qubits) const; + + const std::shared_ptr method; + uint_t min_qubit = 3; + bool active = true; +}; + +void DiagonalFusion::set_config(const json_t &config) { + if (JSON::check_key("fusion_enable.diagonal", config)) + JSON::get_value(active, "fusion_enable.diagonal", config); + if (JSON::check_key("fusion_min_qubit.diagonal", config)) + JSON::get_value(min_qubit, "fusion_min_qubit.diagonal", config); +} + +bool DiagonalFusion::is_diagonal_op(const op_t& op) const { + + if (op.type == Operations::OpType::diagonal_matrix) + return true; + + if (op.type == Operations::OpType::gate) { + if (op.name == "p" || op.name == "cp" || op.name == "u1" || op.name == "cu1" + || op.name == "mcu1" || op.name== "rz" || op.name== "rzz") + return true; + if (op.name == "u3") + return op.params[0] == std::complex(0.) && op.params[1] == std::complex(0.); + else + return false; + } + + return false; +} + +int DiagonalFusion::get_next_diagonal_end(const oplist_t& ops, + const int from, + std::set& fusing_qubits) const { + + if (is_diagonal_op(ops[from])) { + for (const auto qubit: ops[from].qubits) + fusing_qubits.insert(qubit); + return from; + } + + if (ops[from].type != Operations::OpType::gate) + return -1; + + auto pos = from; + + // find a diagonal gate that has the same lists of CX before and after it + // ┌───┐ ┌───┐ + // q_0: ┤ X ├───────────────────────────────────┤ X ├ + // └─┬─┘┌───┐ ┌──────────┐ ┌───┐└─┬─┘ + // q_1: ──■──┤ X ├────────────┤ diagonal ├─┤ X ├──■── + // └─┬─┘┌──────────┐└──────────┘ └─┬─┘ + // q_2: ───────■──┤ diagonal ├───────────────■─────── + // └──────────┘ + // ■ [from,pos] + + // find first cx list + for (; pos < ops.size(); ++pos) + if (ops[from].type != Operations::OpType::gate || ops[pos].name != "cx") + break; + + if (pos == from || pos == ops.size()) + return -1; + + auto cx_end = pos - 1; + + // ┌───┐ ┌───┐ + // q_0: ┤ X ├───────────────────────────────────┤ X ├ + // └─┬─┘┌───┐ ┌──────────┐ ┌───┐└─┬─┘ + // q_1: ──■──┤ X ├────────────┤ diagonal ├─┤ X ├──■── + // └─┬─┘┌──────────┐└──────────┘ └─┬─┘ + // q_2: ───────■──┤ diagonal ├───────────────■─────── + // └──────────┘ + // ■ [from] ■ [pos] + // ■ [cx_end] + + bool found = false; + // find diagonals + for (; pos < ops.size(); ++pos) + if (is_diagonal_op(ops[pos])) + found = true; + else + break; + + if (!found) + return -1; + + if (pos == ops.size()) + return -1; + + auto u1_end = pos; + + // ┌───┐ ┌───┐ + // q_0: ┤ X ├───────────────────────────────────┤ X ├ + // └─┬─┘┌───┐ ┌──────────┐ ┌───┐└─┬─┘ + // q_1: ──■──┤ X ├────────────┤ diagonal ├─┤ X ├──■── + // └─┬─┘┌──────────┐└──────────┘ └─┬─┘ + // q_2: ───────■──┤ diagonal ├───────────────■─────── + // └──────────┘ + // ■ [from] ■ [pos,u1_end] + // ■ [cx_end] + + // find second cx list that is the reverse of the first + for (; pos < ops.size(); ++pos) { + if (ops[pos].type == Operations::OpType::gate + && ops[pos].name == ops[cx_end].name + && ops[pos].qubits == ops[cx_end].qubits) { + if (cx_end == from) + break; + --cx_end; + } else { + return -1; + } + } + + if (pos == ops.size()) + return -1; + + // ┌───┐ ┌───┐ + // q_0: ┤ X ├───────────────────────────────────┤ X ├ + // └─┬─┘┌───┐ ┌──────────┐ ┌───┐└─┬─┘ + // q_1: ──■──┤ X ├────────────┤ diagonal ├─┤ X ├──■── + // └─┬─┘┌──────────┐└──────────┘ └─┬─┘ + // q_2: ───────■──┤ diagonal ├───────────────■─────── + // └──────────┘ + // ■ [from] ■ [pos] + // ■ [cx_end] ■ [u1_end] + + for (auto i = from; i < u1_end; ++i) + for (const auto qubit: ops[i].qubits) + fusing_qubits.insert(qubit); + + return pos; + +} + +bool DiagonalFusion::aggregate_operations(oplist_t& ops, + const int fusion_start, + const int fusion_end, + const uint_t max_fused_qubits, + const FusionMethod& method) const { + + if (!active || !method.support_diagonal()) + return false; + + // current impl is sensitive to ordering of gates + for (int op_idx = fusion_start; op_idx < fusion_end; ++op_idx) { + + std::set checking_qubits_set; + auto next_diagonal_end = get_next_diagonal_end(ops, op_idx, checking_qubits_set); + + if (next_diagonal_end < 0) + continue; + + if (checking_qubits_set.size() > max_fused_qubits) + continue; + + auto next_diagonal_start = next_diagonal_end + 1; + + int cnt = 0; + while (true) { + auto next_diagonal_end = get_next_diagonal_end(ops, next_diagonal_start, checking_qubits_set); + if (next_diagonal_end < 0) + break; + if (checking_qubits_set.size() > max_fused_qubits) + break; + next_diagonal_start = next_diagonal_end + 1; + } + + if (checking_qubits_set.size() < min_qubit) + continue; + + std::vector fusing_op_idxs; + for (; op_idx < next_diagonal_start; ++op_idx) + fusing_op_idxs.push_back(op_idx); + + --op_idx; + allocate_new_operation(ops, op_idx, fusing_op_idxs, method, true); + } + + return true; +} class Fusion : public CircuitOptimization { public: @@ -49,15 +685,8 @@ class Fusion : public CircuitOptimization { * - fusion_cost_factor (double): a cost function to estimate an aggregate * gate [Default: 1.8] */ - Fusion(uint_t _max_qubit = 5, uint_t _threshold = 14, double _cost_factor = 1.8) - : max_qubit(_max_qubit), threshold(_threshold), cost_factor(_cost_factor) {} + Fusion(); - // Allowed fusion methods: - // - Unitary: only fuse gates into unitary instructions - // - SuperOp: fuse gates, reset, kraus, and superops into kraus instuctions - // - Kraus: fuse gates, reset, kraus, and superops into kraus instuctions - enum class Method {unitary, kraus, superop}; - void set_config(const json_t &config) override; virtual void set_parallelization(uint_t num) { parallelization_ = num; }; @@ -70,9 +699,9 @@ class Fusion : public CircuitOptimization { ExperimentResult &result) const override; // Qubit threshold for activating fusion pass - uint_t max_qubit; - uint_t threshold; - double cost_factor; + uint_t max_qubit = 5; + uint_t threshold = 14; + bool verbose = false; bool active = true; bool allow_superop = false; @@ -84,57 +713,52 @@ class Fusion : public CircuitOptimization { uint_t parallel_threshold_ = 10000; private: - bool can_ignore(const op_t& op) const; - - bool can_apply_fusion(const op_t& op, - uint_t max_max_fused_qubits, - Method method) const; - - double get_cost(const op_t& op) const; - void optimize_circuit(Circuit& circ, - Noise::NoiseModel& noise, + const Noise::NoiseModel& noise, const opset_t &allowed_opset, - uint_t ops_start, - uint_t ops_end) const; - - bool aggregate_operations(oplist_t& ops, - const int fusion_start, - const int fusion_end, - uint_t max_fused_qubits, - Method method) const; - - // Aggregate a subcircuit of operations into a single operation - op_t generate_fusion_operation(const std::vector& fusioned_ops, - const reg_t &num_qubits, - Method method) const; - - bool is_diagonal(const oplist_t& ops, - const uint_t from, - const uint_t until) const; - - double estimate_cost(const oplist_t& ops, - const uint_t from, - const uint_t until) const; - - void add_fusion_qubits(reg_t& fusion_qubits, const op_t& op) const; + const uint_t ops_start, + const uint_t ops_end, + const std::shared_ptr& fuser, + const FusionMethod& method) const; #ifdef DEBUG - void dump(const Circuit& circuit) const; + void dump(const Circuit& circuit) const { + auto& ops = circuit.ops; + for (uint_t op_idx = 0; op_idx < ops.size(); ++op_idx) { + std::cout << std::setw(3) << op_idx << ": "; + if (ops[op_idx].type == optype_t::nop) { + std::cout << std::setw(15) << "nop" << ": "; + } else { + std::cout << std::setw(15) << ops[op_idx].name << "-" << ops[op_idx].qubits.size() << ": "; + if (ops[op_idx].qubits.size() > 0) { + auto qubits = ops[op_idx].qubits; + std::sort(qubits.begin(), qubits.end()); + int pos = 0; + for (int j = 0; j < qubits.size(); ++j) { + int q_pos = 1 + qubits[j] * 2; + for (int k = 0; k < (q_pos - pos); ++k) { + std::cout << " "; + } + pos = q_pos + 1; + std::cout << "X"; + } + } + } + std::cout << std::endl; + } + } #endif private: - const static Operations::OpSet noise_opset_; + std::vector> fusers; }; - -const Operations::OpSet Fusion::noise_opset_( - {Operations::OpType::kraus, - Operations::OpType::superop, - Operations::OpType::reset}, - {}, {} -); - +Fusion::Fusion() { + fusers.push_back(std::make_shared()); + fusers.push_back(std::make_shared>()); + fusers.push_back(std::make_shared>()); + fusers.push_back(std::make_shared()); +} void Fusion::set_config(const json_t &config) { @@ -152,9 +776,9 @@ void Fusion::set_config(const json_t &config) { if (JSON::check_key("fusion_threshold", config_)) JSON::get_value(threshold, "fusion_threshold", config_); - if (JSON::check_key("fusion_cost_factor", config)) - JSON::get_value(cost_factor, "fusion_cost_factor", config); - + for (std::shared_ptr& fuser: fusers) + fuser->set_config(config_); + if (JSON::check_key("fusion_allow_kraus", config)) JSON::get_value(allow_kraus, "fusion_allow_kraus", config); @@ -170,6 +794,11 @@ void Fusion::optimize_circuit(Circuit& circ, const opset_t &allowed_opset, ExperimentResult &result) const { +#ifdef DEBUG + std::cout << "original" << std::endl; + dump(circ); +#endif + // Start timer using clock_t = std::chrono::high_resolution_clock; auto timer_start = clock_t::now(); @@ -182,7 +811,6 @@ void Fusion::optimize_circuit(Circuit& circ, result.metadata.add(true, "fusion", "enabled"); result.metadata.add(threshold, "fusion", "threshold"); - result.metadata.add(cost_factor, "fusion", "cost_factor"); result.metadata.add(max_qubit, "fusion", "max_fused_qubits"); // Check qubit threshold @@ -190,185 +818,108 @@ void Fusion::optimize_circuit(Circuit& circ, result.metadata.add(false, "fusion", "applied"); return; } + // Determine fusion method - // TODO: Support Kraus fusion method - Method method = Method::unitary; - if (allow_superop && allowed_opset.contains(optype_t::superop) && - (circ.opset().contains(optype_t::kraus) - || circ.opset().contains(optype_t::superop) - || circ.opset().contains(optype_t::reset))) { - method = Method::superop; - } else if (allow_kraus && allowed_opset.contains(optype_t::kraus) && - (circ.opset().contains(optype_t::kraus) - || circ.opset().contains(optype_t::superop))) { - method = Method::kraus; - } - if (method == Method::unitary) { - result.metadata.add("unitary", "fusion", "method"); - } else if (method == Method::superop) { - result.metadata.add("superop", "fusion", "method"); - } else if (method == Method::kraus) { - result.metadata.add("kraus", "fusion", "method"); - } + FusionMethod& method = FusionMethod::find_method(circ, allowed_opset, allow_superop, allow_kraus); + result.metadata.add(method.name(), "fusion", "method"); - if (circ.ops.size() < parallel_threshold_ || parallelization_ <= 1) { - optimize_circuit(circ, noise, allowed_opset, 0, circ.ops.size()); - } else { - // determine unit for each OMP thread - int_t unit = circ.ops.size() / parallelization_; - if (circ.ops.size() % parallelization_) - ++unit; + bool applied = false; + for (const std::shared_ptr& fuser: fusers) { + fuser->set_metadata(result); + + if (circ.ops.size() < parallel_threshold_ || parallelization_ <= 1) { + optimize_circuit(circ, noise, allowed_opset, 0, circ.ops.size(), fuser, method); + result.metadata.add(1, "fusion", "parallelization"); + } else { + // determine unit for each OMP thread + int_t unit = circ.ops.size() / parallelization_; + if (circ.ops.size() % parallelization_) + ++unit; #pragma omp parallel for if (parallelization_ > 1) num_threads(parallelization_) - for (int_t i = 0; i < parallelization_; i++) { - int_t start = unit * i; - int_t end = std::min(start + unit, (int_t) circ.ops.size()); - optimize_circuit(circ, noise, allowed_opset, start, end); + for (int_t i = 0; i < parallelization_; i++) { + int_t start = unit * i; + int_t end = std::min(start + unit, (int_t) circ.ops.size()); + optimize_circuit(circ, noise, allowed_opset, start, end, fuser, method); + } + result.metadata.add(parallelization_, "fusion", "parallelization"); } - } - - result.metadata.add(parallelization_, "fusion", "parallelization"); - auto timer_stop = clock_t::now(); - result.metadata.add(std::chrono::duration(timer_stop - timer_start).count(), "fusion", "time_taken"); + size_t idx = 0; + for (size_t i = 0; i < circ.ops.size(); ++i) { + if (circ.ops[i].type != optype_t::nop) { + if (i != idx) + circ.ops[idx] = circ.ops[i]; + ++idx; + } + } - size_t idx = 0; - for (size_t i = 0; i < circ.ops.size(); ++i) { - if (circ.ops[i].type != optype_t::nop) { - if (i != idx) - circ.ops[idx] = circ.ops[i]; - ++idx; + if (idx != circ.ops.size()) { + applied = true; + circ.ops.erase(circ.ops.begin() + idx, circ.ops.end()); + circ.set_params(); } - } - if (idx == circ.ops.size()) { - result.metadata.add(false, "fusion", "applied"); - } else { - circ.ops.erase(circ.ops.begin() + idx, circ.ops.end()); - result.metadata.add(true, "fusion", "applied"); - circ.set_params(); +#ifdef DEBUG + std::cout << fuser->name() << std::endl; + dump(circ); +#endif - if (verbose) - result.metadata.add(circ.ops, "fusion", "output_ops"); } + result.metadata.add(applied, "fusion", "applied"); + if (applied && verbose) + result.metadata.add(circ.ops, "fusion", "output_ops"); + + auto timer_stop = clock_t::now(); + result.metadata.add(std::chrono::duration(timer_stop - timer_start).count(), "fusion", "time_taken"); } void Fusion::optimize_circuit(Circuit& circ, - Noise::NoiseModel& noise, + const Noise::NoiseModel& noise, const opset_t &allowed_opset, - uint_t ops_start, - uint_t ops_end) const { - - // Determine fusion method - // TODO: Support Kraus fusion method - Method method = Method::unitary; - if (allow_superop && allowed_opset.contains(optype_t::superop) && - (circ.opset().contains(optype_t::kraus) - || circ.opset().contains(optype_t::superop) - || circ.opset().contains(optype_t::reset))) { - method = Method::superop; - } else if (allow_kraus && allowed_opset.contains(optype_t::kraus) && - (circ.opset().contains(optype_t::kraus) - || circ.opset().contains(optype_t::superop))) { - method = Method::kraus; - } + const uint_t ops_start, + const uint_t ops_end, + const std::shared_ptr& fuser, + const FusionMethod& method) const { uint_t fusion_start = ops_start; uint_t op_idx; for (op_idx = ops_start; op_idx < ops_end; ++op_idx) { - if (can_ignore(circ.ops[op_idx])) + if (method.can_ignore(circ.ops[op_idx])) continue; - if (!can_apply_fusion(circ.ops[op_idx], max_qubit, method) || op_idx == (ops_end - 1)) { - aggregate_operations(circ.ops, fusion_start, op_idx, max_qubit, method); + if (!method.can_apply(circ.ops[op_idx], max_qubit) || op_idx == (ops_end - 1)) { + fuser->aggregate_operations(circ.ops, fusion_start, op_idx, max_qubit, method); fusion_start = op_idx + 1; } } } -bool Fusion::can_ignore(const op_t& op) const { - switch (op.type) { - case optype_t::barrier: - return true; - case optype_t::gate: - return op.name == "id" || op.name == "u0"; - default: - return false; - } -} - -bool Fusion::can_apply_fusion(const op_t& op, uint_t max_fused_qubits, Method method) const { - if (op.conditional) - return false; - switch (op.type) { - case optype_t::matrix: - return op.mats.size() == 1 && op.qubits.size() <= max_fused_qubits; - case optype_t::kraus: - case optype_t::reset: - case optype_t::superop: { - return method != Method::unitary && op.qubits.size() <= max_fused_qubits; - } - case optype_t::gate: { - if (op.qubits.size() > max_fused_qubits) - return false; - return (method == Method::unitary) - ? QubitUnitary::StateOpSet.contains_gates(op.name) - : QubitSuperoperator::StateOpSet.contains_gates(op.name); - } - case optype_t::measure: - case optype_t::bfunc: - case optype_t::roerror: - case optype_t::snapshot: - case optype_t::barrier: - default: - return false; - } -} - -double Fusion::get_cost(const op_t& op) const { - if (can_ignore(op)) - return .0; - else - return cost_factor; +void CostBasedFusion::set_metadata(ExperimentResult &result) const { + result.metadata.add(cost_factor, "fusion", "cost_factor"); } +void CostBasedFusion::set_config(const json_t &config) { -op_t Fusion::generate_fusion_operation(const std::vector& fusioned_ops, - const reg_t &qubits, - Method method) const { - // Run simulation - RngEngine dummy_rng; - ExperimentResult dummy_result; - - if (method == Method::unitary) { - // Unitary simulation - QubitUnitary::State<> unitary_simulator; - unitary_simulator.initialize_qreg(qubits.size()); - unitary_simulator.apply_ops(fusioned_ops, dummy_result, dummy_rng); - return Operations::make_unitary(qubits, unitary_simulator.move_to_matrix(), - std::string("fusion")); - } + if (JSON::check_key("fusion_cost_factor", config)) + JSON::get_value(cost_factor, "fusion_cost_factor", config); - // For both Kraus and SuperOp method we simulate using superoperator - // simulator - QubitSuperoperator::State<> superop_simulator; - superop_simulator.initialize_qreg(qubits.size()); - superop_simulator.apply_ops(fusioned_ops, dummy_result, dummy_rng); - auto superop = superop_simulator.move_to_matrix(); + if (JSON::check_key("fusion_enable.cost_based", config)) + JSON::get_value(active, "fusion_enable.cost_based", config); - if (method == Method::superop) { - return Operations::make_superop(qubits, std::move(superop)); + for (int i = 0; i < 64; ++i) { + auto prop_name = "fusion_cost." + std::to_string(i + 1); + if (JSON::check_key(prop_name, config)) + JSON::get_value(costs[i], prop_name, config); } - - // If Kraus method we convert superop to canonical Kraus representation - size_t dim = 1 << qubits.size(); - return Operations::make_kraus(qubits, Utils::superop2kraus(superop, dim)); } -bool Fusion::aggregate_operations(oplist_t& ops, +bool CostBasedFusion::aggregate_operations(oplist_t& ops, const int fusion_start, const int fusion_end, - uint_t max_fused_qubits, - Method method) const { + const uint_t max_fused_qubits, + const FusionMethod& method) const { + if (!active) + return false; // costs[i]: estimated cost to execute from 0-th to i-th in original.ops std::vector costs; @@ -377,14 +928,14 @@ bool Fusion::aggregate_operations(oplist_t& ops, // set costs and fusion_to of fusion_start fusion_to.push_back(fusion_start); - costs.push_back(get_cost(ops[fusion_start])); + costs.push_back(method.can_ignore(ops[fusion_start])? .0 : cost_factor); bool applied = false; // calculate the minimal path to each operation in the circuit for (int i = fusion_start + 1; i < fusion_end; ++i) { // init with fusion from i-th to i-th fusion_to.push_back(i); - costs.push_back(costs[i - fusion_start - 1] + get_cost(ops[i])); + costs.push_back(costs[i - fusion_start - 1] + (method.can_ignore(ops[i])? .0 : cost_factor)); for (int num_fusion = 2; num_fusion <= static_cast (max_fused_qubits); ++num_fusion) { // calculate cost if {num_fusion}-qubit fusion is applied @@ -416,36 +967,13 @@ bool Fusion::aggregate_operations(oplist_t& ops, // generate a new circuit with the minimal path to the last operation in the circuit for (int i = fusion_end - 1; i >= fusion_start;) { - int to = fusion_to[i - fusion_start]; - if (to != i) { - std::vector fusioned_ops; - std::set fusioned_qubits; - for (int j = to; j <= i; ++j) { - fusioned_ops.push_back(ops[j]); - fusioned_qubits.insert(ops[j].qubits.cbegin(), ops[j].qubits.cend()); - ops[j].type = optype_t::nop; - } - if (!fusioned_ops.empty()) { - // We need to remap qubits in fusion subcircuits for simulation - // TODO: This could be done above during the fusion cost calculation - reg_t qubits(fusioned_qubits.begin(), fusioned_qubits.end()); - std::unordered_map qubit_mapping; - for (size_t j = 0; j < qubits.size(); j++) { - qubit_mapping[qubits[j]] = j; - } - // Remap qubits and determine method - bool non_unitary = false; - for (auto & op: fusioned_ops) { - non_unitary |= noise_opset_.contains(op.type); - for (size_t j = 0; j < op.qubits.size(); j++) { - op.qubits[j] = qubit_mapping[op.qubits[j]]; - } - } - Method required_method = (non_unitary) ? method : Method::unitary; - ops[i] = generate_fusion_operation(fusioned_ops, qubits, required_method); - } + std::vector fusing_op_idxs; + for (int j = to; j <= i; ++j) + fusing_op_idxs.push_back(j); + if (!fusing_op_idxs.empty()) + allocate_new_operation(ops, i, fusing_op_idxs, method, false); } i = to - 1; } @@ -456,7 +984,7 @@ bool Fusion::aggregate_operations(oplist_t& ops, // Gate-swap optimized helper functions //------------------------------------------------------------------------------ -bool Fusion::is_diagonal(const std::vector& ops, +bool CostBasedFusion::is_diagonal(const std::vector& ops, const uint_t from, const uint_t until) const { @@ -485,34 +1013,38 @@ bool Fusion::is_diagonal(const std::vector& ops, return true; } -double Fusion::estimate_cost(const std::vector& ops, +double CostBasedFusion::estimate_cost(const std::vector& ops, const uint_t from, const uint_t until) const { if (is_diagonal(ops, from, until)) - return cost_factor; + return 1.0; reg_t fusion_qubits; for (uint_t i = from; i <= until; ++i) add_fusion_qubits(fusion_qubits, ops[i]); + auto configured_cost = costs[fusion_qubits.size() - 1]; + if (configured_cost > 0) + return configured_cost; + if(is_avx2_supported()){ switch (fusion_qubits.size()) { case 1: // [[ falling through :) ]] case 2: - return cost_factor; + return 1.0; case 3: - return cost_factor * 1.1; + return 1.1; case 4: - return cost_factor * 3; + return 3; default: - return pow(cost_factor, (double) std::max(fusion_qubits.size() - 1, size_t(1))); + return pow(cost_factor, (double) std::max(fusion_qubits.size() - 2, size_t(1))); } } return pow(cost_factor, (double) std::max(fusion_qubits.size() - 1, size_t(1))); } -void Fusion::add_fusion_qubits(reg_t& fusion_qubits, const op_t& op) const { +void CostBasedFusion::add_fusion_qubits(reg_t& fusion_qubits, const op_t& op) const { for (const auto &qubit: op.qubits){ if (find(fusion_qubits.begin(), fusion_qubits.end(), qubit) == fusion_qubits.end()){ fusion_qubits.push_back(qubit); diff --git a/test/terra/backends/qasm_simulator/qasm_fusion.py b/test/terra/backends/qasm_simulator/qasm_fusion.py index 838810cc5c..9412785588 100644 --- a/test/terra/backends/qasm_simulator/qasm_fusion.py +++ b/test/terra/backends/qasm_simulator/qasm_fusion.py @@ -14,9 +14,10 @@ """ # pylint: disable=no-member import copy +import numpy as np from qiskit import QuantumRegister, ClassicalRegister, QuantumCircuit -from qiskit.circuit.library import QuantumVolume, QFT +from qiskit.circuit.library import QuantumVolume, QFT, RealAmplitudes from qiskit.compiler import assemble, transpile from qiskit.providers.aer import QasmSimulator from qiskit.providers.aer.noise import NoiseModel @@ -463,3 +464,80 @@ def test_fusion_parallelization(self): result_serial.get_counts(circuit), delta=0.0, msg="parallelized fusion was failed") + + def test_fusion_two_qubits(self): + """Test 2-qubit fusion""" + shots = 100 + num_qubits = 8 + reps = 3 + + circuit = RealAmplitudes(num_qubits=num_qubits, entanglement='linear', reps=reps) + param_binds = {} + for param in circuit.parameters: + param_binds[param] = np.random.random() + + circuit = transpile(circuit.bind_parameters(param_binds), + backend=self.SIMULATOR, + optimization_level=0) + circuit.measure_all() + + qobj = assemble([circuit], + self.SIMULATOR, + shots=shots, + seed_simulator=1) + + backend_options = self.fusion_options(enabled=True, threshold=1) + backend_options['fusion_verbose'] = True + + backend_options['fusion_enable.2_qubits'] = False + result_disabled = self.SIMULATOR.run(qobj, **backend_options).result() + meta_disabled = self.fusion_metadata(result_disabled) + + backend_options['fusion_enable.2_qubits'] = True + result_enabled = self.SIMULATOR.run(qobj, **backend_options).result() + meta_enabled = self.fusion_metadata(result_enabled) + + self.assertTrue(getattr(result_disabled, 'success', 'False')) + self.assertTrue(getattr(result_enabled, 'success', 'False')) + + self.assertTrue(len(meta_enabled['output_ops']) if 'output_ops' in meta_enabled else len(circuit.ops) < + len(meta_disabled['output_ops']) if 'output_ops' in meta_disabled else len(circuit.ops)) + + def test_fusion_diagonal(self): + """Test diagonal fusion""" + shots = 100 + num_qubits = 8 + + circuit = QuantumCircuit(num_qubits) + for i in range(num_qubits): + circuit.p(0.1, i) + + for i in range(num_qubits - 1): + circuit.cp(0.1, i, i + 1) + + circuit = transpile(circuit, + backend=self.SIMULATOR, + optimization_level=0) + circuit.measure_all() + + qobj = assemble([circuit], + self.SIMULATOR, + shots=shots, + seed_simulator=1) + + backend_options = self.fusion_options(enabled=True, threshold=1) + backend_options['fusion_verbose'] = True + + backend_options['fusion_enable.cost_base'] = False + result = self.SIMULATOR.run(qobj, **backend_options).result() + meta = self.fusion_metadata(result) + + method = result.results[0].metadata.get('method') + if method not in ['statevector']: + return + + for op in meta['output_ops']: + op_name = op['name'] + if op_name == 'measure': + break + self.assertEqual(op_name, 'diagonal')