Add implementations for optimised gate kernels #85

ThomasLoke · 2021-03-11T14:52:50Z

I'm still mulling over the most natural API, but this is one attempt at it. This is comprised of two commits:

A refactoring commit that simply moves the core of the gate application (matrix multiplication) into the base AbstractGate class.
Adds optimised kernels for most gates that overrides the default base class implementation. Some gates (e.g. fully populated 2-by-2 matrices for single-qubit gates) have little (if any?) benefit from not using the default kernel, so I left them as-is.

A word of caution: I've yet to test this extensively. In order to eliminate the function call overhead (especially for the single qubit gates), I think what needs to happen is to move the outer loop over applyKernel into said function (unfortunately, I don't think runtime polymorphism works with inline functions unless the concrete class can be statically determined). It may be a worthwhile exercise anyway for when we start vectorising, but again, I've not really tested much.

codecov · 2021-03-11T14:54:05Z

Codecov Report

Merging #85 (f305304) into master (a89ebde) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #85   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files           3        3           
  Lines          51       51           
=======================================
  Hits           50       50           
  Misses          1        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a89ebde...f305304. Read the comment docs.

antalszava · 2021-03-11T15:46:19Z

Awesome, thanks for this PR @ThomasLoke! 🎊 we'll have a look

antalszava

Looks good to me! 🥇 💯 Thank you so much for the quick PR, helps a lot! 😊

A couple of suggestions/questions, but good to go from my side. Ran the PennyLane device test suite and it completed well.

Was wondering: could it make sense to add some special cases too: e.g., when applying PauliX, we skip the swap if the two vector elements are identical. 🤔 Such cases could potentially arise when there's a more complex algorithm and gates are applied on states which are not known in advance. Not in scope here necessarily, just a thought.

Benchmarked T for 23 qubits as an example:

master

import pennylane as qml

dev = qml.device('lightning.qubit', wires=23)

%timeit dev.apply_lightning(dev._state, [qml.T(wires=[0])])

49.5 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#85

import pennylane as qml

dev = qml.device('lightning.qubit', wires=23)

%timeit dev.apply_lightning(dev._state, [qml.T(wires=[0])])

19.9 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

default.qubit

import pennylane as qml

dev = qml.device('default.qubit', wires=23)

%timeit dev.apply([qml.T(wires=[0])])

43.9 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Side note for benchmarking: using dev.state instead of dev._state with lightning takes ~10 ms.

antalszava · 2021-03-11T15:53:14Z

pennylane_lightning/src/Apply.cpp

-                shiftedStatePtr[internalIndex] += matrix[baseIndex + j] * inputVector[j];
-            }
-        }
+        gate->applyKernel(shiftedStatePtr, internalIndices, inputVector);


antalszava · 2021-03-11T15:53:47Z

pennylane_lightning/src/Gates.cpp

@@ -61,6 +85,10 @@ const vector<CplxType> Pennylane::XGate::matrix{
    0, 1,
    1, 0 };

+void Pennylane::XGate::applyKernel(CplxType* state, vector<size_t>& indices, vector<CplxType>& v) {


constness and making v unnamed if unused would be a suggestion in other cases too

Suggested change

void Pennylane::XGate::applyKernel(CplxType* state, vector<size_t>& indices, vector<CplxType>& v) {

void Pennylane::XGate::applyKernel(CplxType* state, const vector<size_t>& indices, vector<CplxType>&) {

I'm okay with adding the const; I'm generally not a fan of removing variable names from method signatures, but I guess opinions may differ on this one--if nothing else, its making it clear that its overriding the same virtual method.

Got it! A reason to make v unnamed would be that while reading, having the name v was somewhat confusing, as one would expect that to be used within the function (as is in some of them).

True; although I think that's indication we should rework the API (i.e. we shouldn't be passing in parameters we don't need if we don't have to). As mentioned in the PR description, its probably beneficial, although it will unfortunately result in some measure of code duplication.

Or if branching off would result in a performance degradation, how about incorporating the internal loop in applyKernel or creating another member function that implements the internal loop and depends on applyKernel? That way inputVector would only be created if used.

That would be ideal. However, ultimately I think we want to inline applyKernel. And for that to happen I think we need to duplicate the outer loop for every subclass, because of the requirement to be able to statically infer which method is being called. Again, probably beneficial in the long run, but just a bit painful to look at.

antalszava · 2021-03-11T15:55:28Z

pennylane_lightning/src/Gates.hpp

+        /**
+         * Generic matrix-multiplication kernel
+         */
+        virtual void applyKernel(CplxType* state, std::vector<size_t>& gateIndices, std::vector<CplxType>& v);


Worth making v more descriptive, seems to stand for inputVector

antalszava · 2021-03-11T16:07:27Z

pennylane_lightning/src/Gates.cpp

+    v[0] = state[indices[0]];
+    state[indices[0]] = -IMAG * state[indices[1]];
+    state[indices[1]] = IMAG * v[0];


How come we need to update v[0] here (and not for PauliX for example)? Seems like it's used as a temp for the swap, but it would also have effect afterwards, correct?

v is really just scratch space. If we move the outer loop in, we can avoid allocating v altogether, except in the base implementation. The requirement to pass v is one reason why I'm not entirely happy with this interface.

Got it!

we can avoid allocating v altogether, except in the base implementation

This would be preferable as just creating v would result in an overhead, correct? Could we branch of at the beginning of the outer loop based on the type of operation?

The allocation for v is really insignificant in the grand scheme of things, so I don't think branching is necessary. The first rule of optimisation is to not over-optimise, and I don't think this ever becomes a bottleneck, so its more a question of code clarity as to which API we adopt.

antalszava · 2021-03-11T16:10:18Z

pennylane_lightning/src/Gates.cpp

+    , r1(c* std::pow(M_E, CplxType(0, (-phi - omega) / 2)))
+    , r2(-s * std::pow(M_E, CplxType(0, (phi - omega) / 2)))
+    , r3(s* std::pow(M_E, CplxType(0, (-phi + omega) / 2)))
+    , r4(c* std::pow(M_E, CplxType(0, (phi + omega) / 2)))
    , matrix{
-      c * std::pow(M_E, CplxType(0, (-phi - omega) / 2)), -s * std::pow(M_E, CplxType(0, (phi - omega) / 2)),
-      s * std::pow(M_E, CplxType(0, (-phi + omega) / 2)), c * std::pow(M_E, CplxType(0, (phi + omega) / 2)) }
+      r1, r2,
+      r3, r4 }


trbromley

Thanks @ThomasLoke! This is really exciting! How can we help with this, perhaps one thing would be to work on adding tests for the kernels.

More generally, we're looking at adding an adjoint method for differentiation in #83. For that, we need to:

Apply the inverse operation
Apply the generator of an operation, i.e., apply H from e^(i H t).

I was wondering how/whether these would impact the API suggested here. In PennyLane, the name of an inverse gate just has .inv appended at the end (e.g., RX.inv for inverse of RX). Perhaps the first line of constructAndApplyOperation() could remove the .inv and then set a boolean accordingly, then constructGate() would accept that boolean to indicate whether to apply the inverse? We may then have to define applyKernelInverse for each gate.

For generators of parameterized gates, I imagine we might have a applyGenerator() method?

Anyway, just thinking out loud - these ideas could be included in later iterations.

pennylane_lightning/src/Gates.cpp

ThomasLoke

I can't get to these changes tonight, so it'll have to wait until tomorrow. Or just feel free to take this over and edit it as you see fit...

could it make sense to add some special cases too: e.g., when applying PauliX, we skip the swap if the two vector elements are identical. 🤔 Such cases could potentially arise when there's a more complex algorithm and gates are applied on states which are not known in advance.

Branching statements in the interior of tight loops may well be detrimental to performance due to branch misprediction, so it's probably not worth it unless it's saving a non-trivial amount of effort.

More generally, we're looking at adding an adjoint method for differentiation in #83.

I haven't taken a close look at the PR, but the API you propose does sound sensible to me :)

pennylane_lightning/src/Gates.cpp

ThomasLoke · 2021-03-11T17:13:01Z

pennylane_lightning/src/Gates.cpp

@@ -61,6 +85,10 @@ const vector<CplxType> Pennylane::XGate::matrix{
    0, 1,
    1, 0 };

+void Pennylane::XGate::applyKernel(CplxType* state, vector<size_t>& indices, vector<CplxType>& v) {


I'm okay with adding the const; I'm generally not a fan of removing variable names from method signatures, but I guess opinions may differ on this one--if nothing else, its making it clear that its overriding the same virtual method.

ThomasLoke · 2021-03-11T17:14:40Z

pennylane_lightning/src/Gates.cpp

+    v[0] = state[indices[0]];
+    state[indices[0]] = -IMAG * state[indices[1]];
+    state[indices[1]] = IMAG * v[0];


v is really just scratch space. If we move the outer loop in, we can avoid allocating v altogether, except in the base implementation. The requirement to pass v is one reason why I'm not entirely happy with this interface.

ThomasLoke · 2021-03-12T14:10:20Z

I've bitten the bullet and moved the outer loop into applyKernel(). I think the API looks cleaner, at the cost of some code duplication, which I can probably live with. I'd speculate that it should perform better in terms of reduced function overheads and that the compiler probably has more things it can do now that the outer loop doesn't loop over a runtime polymorphic function, but its hard to say what (if any) measurable difference it'll make without benchmarking it properly, which I'd appreciate some help with.

pennylane_lightning/src/Gates.hpp

trbromley

Approved! Very nice PR!

trbromley · 2021-03-12T17:48:06Z

Thanks @ThomasLoke!

antalszava · 2021-03-12T21:12:42Z

Thanks @ThomasLoke! For benchmarking, created #87 which provides benchmarks for certain gates. It's not meant to be extensive but could give quick feedback when working on PRs. Thoughts on what could be important to be added in a similar fashion are appreciated.

ThomasLoke added 2 commits March 10, 2021 13:46

Refactoring -- move matrix multiplication into gate implementation

7b1fa00

Add optimised kernels for most gate types

e660d94

antalszava approved these changes Mar 11, 2021

View reviewed changes

trbromley reviewed Mar 11, 2021

View reviewed changes

pennylane_lightning/src/Gates.cpp Show resolved Hide resolved

ThomasLoke commented Mar 11, 2021

View reviewed changes

ThomasLoke added 2 commits March 12, 2021 13:56

Move outer loop into applyKernel

f2c4db7

Remove leftover code

011e9bc

antalszava reviewed Mar 12, 2021

View reviewed changes

pennylane_lightning/src/Gates.hpp Show resolved Hide resolved

Merge branch 'master' into gate-kernels

88a9e97

trbromley approved these changes Mar 12, 2021

View reviewed changes

Fix assert

f305304

trbromley merged commit 19a655c into PennyLaneAI:master Mar 12, 2021

trbromley mentioned this pull request Mar 16, 2021

PauliX with swaps #84

Closed

antalszava mentioned this pull request Mar 19, 2021

Support inverse gates #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementations for optimised gate kernels #85

Add implementations for optimised gate kernels #85

ThomasLoke commented Mar 11, 2021 •

edited

Loading

codecov bot commented Mar 11, 2021 •

edited

Loading

antalszava commented Mar 11, 2021

antalszava left a comment •

edited

Loading

antalszava Mar 11, 2021

antalszava Mar 11, 2021

ThomasLoke Mar 11, 2021

antalszava Mar 12, 2021

ThomasLoke Mar 12, 2021

antalszava Mar 12, 2021

ThomasLoke Mar 12, 2021

antalszava Mar 11, 2021

antalszava Mar 11, 2021

ThomasLoke Mar 11, 2021

antalszava Mar 12, 2021

ThomasLoke Mar 12, 2021

antalszava Mar 11, 2021

trbromley left a comment

ThomasLoke left a comment •

edited

Loading

ThomasLoke Mar 11, 2021

ThomasLoke Mar 11, 2021

ThomasLoke commented Mar 12, 2021 •

edited

Loading

trbromley left a comment

trbromley commented Mar 12, 2021

antalszava commented Mar 12, 2021 •

edited

Loading

	void Pennylane::XGate::applyKernel(CplxType* state, vector<size_t>& indices, vector<CplxType>& v) {
	void Pennylane::XGate::applyKernel(CplxType* state, const vector<size_t>& indices, vector<CplxType>&) {

Add implementations for optimised gate kernels #85

Add implementations for optimised gate kernels #85

Conversation

ThomasLoke commented Mar 11, 2021 • edited Loading

codecov bot commented Mar 11, 2021 • edited Loading

Codecov Report

antalszava commented Mar 11, 2021

antalszava left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

ThomasLoke left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasLoke commented Mar 12, 2021 • edited Loading

trbromley left a comment

Choose a reason for hiding this comment

trbromley commented Mar 12, 2021

antalszava commented Mar 12, 2021 • edited Loading

ThomasLoke commented Mar 11, 2021 •

edited

Loading

codecov bot commented Mar 11, 2021 •

edited

Loading

antalszava left a comment •

edited

Loading

ThomasLoke left a comment •

edited

Loading

ThomasLoke commented Mar 12, 2021 •

edited

Loading

antalszava commented Mar 12, 2021 •

edited

Loading