Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gate benchmarking CI #87

Merged
merged 39 commits into from
Mar 17, 2021
Merged

Gate benchmarking CI #87

merged 39 commits into from
Mar 17, 2021

Conversation

antalszava
Copy link
Contributor

@antalszava antalszava commented Mar 12, 2021

Changes
Adds basic benchmarks that each simulate gate applications for qubits in the regime of [1, 3, 5, 10, 15, 18]. The "PauliX", "T", "Hadamard", "CNOT" gates are run 10000 times by timing the following snippet:

def apply_op():
    # Calling apply to minimize the Python overhead
    pennylane_op = getattr(qml, gate)
    if pennylane_op.num_wires == 1:
        dev.apply([pennylane_op(wires=0)])
    elif num_q > 1 and pennylane_op.num_wires == 2:
        dev.apply([pennylane_op(wires=[0, 1])])

The output is a plot comparing:

  • lightning.qubit according to the modifications in the PR where the CI check is run
  • lightning.qubit in master
  • default.qubit

Output: a .png image with size ~40 KB that is uploaded as an artifact after each run.
Time to complete: around 3-4 minutes.

The check is run for each commit pushed to an open pull request.

The resulting image files are available in the list of checks by clicking on Details:
image

@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #87 (7ca605b) into master (19a655c) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #87   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files           3        3           
  Lines          51       51           
=======================================
  Hits           50       50           
  Misses          1        1           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19a655c...7ca605b. Read the comment docs.

@antalszava antalszava marked this pull request as ready for review March 12, 2021 20:53
@antalszava antalszava requested a review from trbromley March 12, 2021 20:54
@antalszava antalszava changed the title Gate bench ci Gate benchmarking CI Mar 12, 2021
@ThomasLoke
Copy link
Contributor

Looks good! Some thoughts on what could be added/changed:

  1. A baseline that just measures the cost of copying the numpy array for the state. I assume both default.qubit and lightning.qubit would do this? This may be a significant part of the cost especially if we're just applying single-qubit operations.
  2. Average runtime (instead of total) + error bars for standard deviation.

@antalszava
Copy link
Contributor Author

Thanks @ThomasLoke!

A baseline that just measures the cost of copying the numpy array for the state.

Isn't a reference to the numpy array being passed when calling the bound C++ function from Python?


Sharing some further benchmarks:

Benchmarking C++ apply

Used the following source to call on apply:

#include "pennylane_lightning/src/Apply.hpp"
#include <iostream>


using std::vector;
using Pennylane::CplxType;
using Pennylane::StateVector;

int main(){

    const int qubits = 23;
    const int len = exp2(qubits);
    vector<CplxType> vec(len);

    // Prepare |00....0>
    vec.at(0) = 1;
    StateVector state(vec.data(), len);

    apply(state, {"PauliX"},{{0}}, {{}}, qubits);
    return 0;
}

Created a flame graph for applying PauliX on 23 qubits:

flame_graph_paulix_23_wires_lightning

This seems to indicate that calling generateBitPatterns is ~50% of the gate application.

On a separate run locally, could confirm that the second use of generateBitPatterns took around just as long as calling gate->applyKernel.

So perhaps it could be worth considering if generateBitPatterns could be improved for its second usage.


Benchmarking Python apply

As a separate benchmark, profiled device.apply which is timed by the CI benchmark suite:

lightning.qubit

18 qubits

paulix_lightning_18_qubit

For 23 qubits, lightning_qubit_ops.apply contributed 99.56%.

default.qubit

18 qubits

paulix_default_qubit_18_qubit

For 23 qubits, numeric.roll contributed 98.91%.

@ThomasLoke
Copy link
Contributor

Isn't a reference to the numpy array being passed when calling the bound C++ function from Python?

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated? Or maybe the semantics have changed since I've last looked...

This seems to indicate that calling generateBitPatterns is ~50% of the gate application.

I'm not surprised, but I suspect this varies quite a bit depending on the gate type and dimensionality. A PauliX gate amounts to nothing more than swapping some memory locations, and has to generate 2^(n-1) + 2^1 bit patterns in total. Something like a Hadamard would generate the same number of bit patterns, but does more actual compute. Dense matrices (e.g. QFT) would even further increase the amount of compute (and as the dimension increases, the number of bit patterns will drop as well), so I'd expect that in these cases the relative cost of generateBitPatterns would be much smaller.

That said, there's probably some things we can do to cut down the runtime of generateBitPatterns. Parallelisation would be the obvious candidate, though some care will need to be exercised since its essentially a recursive operation.

@antalszava
Copy link
Contributor Author

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated?

When applying the operations, the state is being passed by reference. Indeed, the state is being copied before applying the rotations. However, these benchmarks use qml.expval(qml.PauliZ(0)), which will not result in any rotations.

@antalszava antalszava mentioned this pull request Mar 16, 2021
Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antalszava, looks great! One general question - these benchmarks are good for comparisons with existing sources, but could we also consider a less trivial circuit? For example, we could even do a random circuit like in the comparison tests or something like strongly entangling layers. And potentially also a calculation of the derivative (although that could come when we add the adjoint). I'm also curious about whether this clashes with/complements the current benchmarking suite being developed 🤔

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated? Or maybe the semantics have changed since I've last looked...

Yes, a copy occurs both on state preparation and measurements not in the standard basis. For state prep, if a user does qml.QubitStateVector, we have a copy to prevent the user input state being mutated.

Average runtime (instead of total) + error bars for standard deviation.

I think it's already avg? (Or maybe it changed since the comment). Agree with std. dev. Also, maybe prefer lines passing through the dots.

.github/workflows/benchmarks.yml Outdated Show resolved Hide resolved
.github/workflows/benchmarks.yml Show resolved Hide resolved
.github/workflows/benchmarks.yml Show resolved Hide resolved
.github/workflows/benchmarks.yml Show resolved Hide resolved
.github/workflows/benchmarks/plot_results.py Outdated Show resolved Hide resolved
.github/workflows/benchmarks/plot_results.py Outdated Show resolved Hide resolved
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

# Acknowledghing the solution for plotting from the quantum-benchmarks repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm curious, how much of this came directly from the external repo? It seems like simple matplotlib plotting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the plotting was based on the external repo. Wanted to specify some attribution as the plots would be similarly constructed and would look similar, maybe enough to just leave a mention?

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering if this script is more suited to attribution, given that we are following the general approach of the external repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Moved the licensing into this file.

bbox_to_anchor=(0.5, 0.97),
)

plt.savefig("gates.png")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing for this PR, but I wonder if we could have something like the QML repo where the CI checks in a PR link directly to the artifact? Think it's in this workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be awesome, but after a look around it seems that there's no straightforward solution 😞 in the qml repo we are using a CircleCI specific action

@ThomasLoke
Copy link
Contributor

I think it's already avg? (Or maybe it changed since the comment)

Ah, you're probably right. I assumed it was the total because the scale for time was 10^5-10^6, but I see now that its labelled as ns, i.e. nanoseconds. In which case, that isn't so surprising, and its fine then.

antalszava and others added 3 commits March 16, 2021 15:17
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
@antalszava
Copy link
Contributor Author

Thank you @trbromley and @ThomasLoke for the comments!

One general question - these benchmarks are good for comparisons with existing sources, but could we also consider a less trivial circuit?

We could definitely, although this PR is just meant to add a couple of elementary benchmarks just to give a quick impression on how the performance is affected.
Benchmarking more advanced features could be added, though as you suggest it will be worth considering if we'd like to add here as CI or separately more in a long-running fashion. It would just be worth keeping the runtime of the benchmarks low.


Although having error bars would be great, it seems that using timeit there's no straightforward way to doing that because we do not gather the individual samples. There is timeit.repeat which wraps around timeit.timeit and runs it several times, but it's explicitly discouraged to post-process its results for obtaining statistics.

Would be tempted to leave this as is just because timeit seems to be recommended over using time.

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antalszava 💯

.github/workflows/benchmarks.yml Show resolved Hide resolved
# See the License for the specific language governing permissions and
# limitations under the License.

# Acknowledging the approach for plotting from the quantum-benchmarks repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe if the code is taken exactly from that repo, we should include the licence here too (sorry!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thanks! Updated

LICENSE Show resolved Hide resolved
LICENSE Outdated Show resolved Hide resolved
@antalszava antalszava merged commit fd41d8f into master Mar 17, 2021
@antalszava antalszava deleted the gate_bench_ci branch March 17, 2021 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants