Gate benchmarking CI #87

antalszava · 2021-03-12T17:23:33Z

Changes
Adds basic benchmarks that each simulate gate applications for qubits in the regime of [1, 3, 5, 10, 15, 18]. The "PauliX", "T", "Hadamard", "CNOT" gates are run 10000 times by timing the following snippet:

def apply_op():
    # Calling apply to minimize the Python overhead
    pennylane_op = getattr(qml, gate)
    if pennylane_op.num_wires == 1:
        dev.apply([pennylane_op(wires=0)])
    elif num_q > 1 and pennylane_op.num_wires == 2:
        dev.apply([pennylane_op(wires=[0, 1])])

The output is a plot comparing:

lightning.qubit according to the modifications in the PR where the CI check is run
lightning.qubit in master
default.qubit

Output: a .png image with size ~40 KB that is uploaded as an artifact after each run.
Time to complete: around 3-4 minutes.

The check is run for each commit pushed to an open pull request.

The resulting image files are available in the list of checks by clicking on Details:

codecov · 2021-03-12T17:24:49Z

Codecov Report

Merging #87 (7ca605b) into master (19a655c) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #87   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files           3        3           
  Lines          51       51           
=======================================
  Hits           50       50           
  Misses          1        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19a655c...7ca605b. Read the comment docs.

This reverts commit 281bb33.

This reverts commit c55da67.

This reverts commit 0bf6558.

…-lightning into gate_bench_ci

ThomasLoke · 2021-03-14T09:03:06Z

Looks good! Some thoughts on what could be added/changed:

A baseline that just measures the cost of copying the numpy array for the state. I assume both default.qubit and lightning.qubit would do this? This may be a significant part of the cost especially if we're just applying single-qubit operations.
Average runtime (instead of total) + error bars for standard deviation.

antalszava · 2021-03-15T02:35:18Z

Thanks @ThomasLoke!

A baseline that just measures the cost of copying the numpy array for the state.

Isn't a reference to the numpy array being passed when calling the bound C++ function from Python?

Sharing some further benchmarks:

Benchmarking C++ apply

Used the following source to call on apply:

#include "pennylane_lightning/src/Apply.hpp"
#include <iostream>


using std::vector;
using Pennylane::CplxType;
using Pennylane::StateVector;

int main(){

    const int qubits = 23;
    const int len = exp2(qubits);
    vector<CplxType> vec(len);

    // Prepare |00....0>
    vec.at(0) = 1;
    StateVector state(vec.data(), len);

    apply(state, {"PauliX"},{{0}}, {{}}, qubits);
    return 0;
}

Created a flame graph for applying PauliX on 23 qubits:

This seems to indicate that calling generateBitPatterns is ~50% of the gate application.

On a separate run locally, could confirm that the second use of generateBitPatterns took around just as long as calling gate->applyKernel.

So perhaps it could be worth considering if generateBitPatterns could be improved for its second usage.

Benchmarking Python apply

As a separate benchmark, profiled device.apply which is timed by the CI benchmark suite:

lightning.qubit

18 qubits

For 23 qubits, lightning_qubit_ops.apply contributed 99.56%.

default.qubit

18 qubits

For 23 qubits, numeric.roll contributed 98.91%.

ThomasLoke · 2021-03-15T17:36:07Z

Isn't a reference to the numpy array being passed when calling the bound C++ function from Python?

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated? Or maybe the semantics have changed since I've last looked...

This seems to indicate that calling generateBitPatterns is ~50% of the gate application.

I'm not surprised, but I suspect this varies quite a bit depending on the gate type and dimensionality. A PauliX gate amounts to nothing more than swapping some memory locations, and has to generate 2^(n-1) + 2^1 bit patterns in total. Something like a Hadamard would generate the same number of bit patterns, but does more actual compute. Dense matrices (e.g. QFT) would even further increase the amount of compute (and as the dimension increases, the number of bit patterns will drop as well), so I'd expect that in these cases the relative cost of generateBitPatterns would be much smaller.

That said, there's probably some things we can do to cut down the runtime of generateBitPatterns. Parallelisation would be the obvious candidate, though some care will need to be exercised since its essentially a recursive operation.

This reverts commit ed1bd6f.

This reverts commit 041b413.

This reverts commit 2741927.

antalszava · 2021-03-16T13:10:03Z

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated?

When applying the operations, the state is being passed by reference. Indeed, the state is being copied before applying the rotations. However, these benchmarks use qml.expval(qml.PauliZ(0)), which will not result in any rotations.

trbromley

Thanks @antalszava, looks great! One general question - these benchmarks are good for comparisons with existing sources, but could we also consider a less trivial circuit? For example, we could even do a random circuit like in the comparison tests or something like strongly entangling layers. And potentially also a calculation of the derivative (although that could come when we add the adjoint). I'm also curious about whether this clashes with/complements the current benchmarking suite being developed 🤔

From my vague recollection, we were making a copy of the numpy array before applying operations and/or rotations to avoid the existing state being mutated? Or maybe the semantics have changed since I've last looked...

Yes, a copy occurs both on state preparation and measurements not in the standard basis. For state prep, if a user does qml.QubitStateVector, we have a copy to prevent the user input state being mutated.

Average runtime (instead of total) + error bars for standard deviation.

I think it's already avg? (Or maybe it changed since the comment). Agree with std. dev. Also, maybe prefer lines passing through the dots.

.github/workflows/benchmarks.yml

.github/workflows/benchmarks/plot_results.py

trbromley · 2021-03-16T13:39:54Z

.github/workflows/benchmarks/plot_results.py

+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Acknowledghing the solution for plotting from the quantum-benchmarks repository


Also I'm curious, how much of this came directly from the external repo? It seems like simple matplotlib plotting.

Indeed, the plotting was based on the external repo. Wanted to specify some attribution as the plots would be similarly constructed and would look similar, maybe enough to just leave a mention?

trbromley · 2021-03-16T13:43:03Z

.github/workflows/benchmarks/run_bench.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+


I was also wondering if this script is more suited to attribution, given that we are following the general approach of the external repo.

Thanks! Moved the licensing into this file.

trbromley · 2021-03-16T13:46:04Z

.github/workflows/benchmarks/plot_results.py

+    bbox_to_anchor=(0.5, 0.97),
+)
+
+plt.savefig("gates.png")


Minor thing for this PR, but I wonder if we could have something like the QML repo where the CI checks in a PR link directly to the artifact? Think it's in this workflow.

That would be awesome, but after a look around it seems that there's no straightforward solution 😞 in the qml repo we are using a CircleCI specific action

ThomasLoke · 2021-03-16T14:45:29Z

I think it's already avg? (Or maybe it changed since the comment)

Ah, you're probably right. I assumed it was the total because the scale for time was 10^5-10^6, but I see now that its labelled as ns, i.e. nanoseconds. In which case, that isn't so surprising, and its fine then.

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

antalszava · 2021-03-16T19:42:21Z

Thank you @trbromley and @ThomasLoke for the comments!

One general question - these benchmarks are good for comparisons with existing sources, but could we also consider a less trivial circuit?

We could definitely, although this PR is just meant to add a couple of elementary benchmarks just to give a quick impression on how the performance is affected.
Benchmarking more advanced features could be added, though as you suggest it will be worth considering if we'd like to add here as CI or separately more in a long-running fashion. It would just be worth keeping the runtime of the benchmarks low.

Although having error bars would be great, it seems that using timeit there's no straightforward way to doing that because we do not gather the individual samples. There is timeit.repeat which wraps around timeit.timeit and runs it several times, but it's explicitly discouraged to post-process its results for obtaining statistics.

Would be tempted to leave this as is just because timeit seems to be recommended over using time.

trbromley

Thanks @antalszava 💯

.github/workflows/benchmarks.yml

trbromley · 2021-03-17T14:46:09Z

.github/workflows/benchmarks/plot_results.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Acknowledging the approach for plotting from the quantum-benchmarks repository


Actually, maybe if the code is taken exactly from that repo, we should include the licence here too (sorry!)

No worries, thanks! Updated

LICENSE

antalszava added 3 commits March 12, 2021 12:17

benchmark CI

9ac9db4

scripts

95694a9

adjust

df24815

antalszava added 23 commits March 12, 2021 12:27

whitespace

67eef84

adjust

8e0fba2

get matplotlib

d4ab3bc

temp remove

0bf6558

temp

c55da67

fix

281bb33

Revert "fix"

49b42d6

This reverts commit 281bb33.

Revert "temp"

6f027cb

This reverts commit c55da67.

Revert "temp remove"

71c8831

This reverts commit 0bf6558.

main/pr

11a925d

creat main/pr

9140de9

fix

e9a893a

fix path

daf3c3b

qubit range; no lines

dcd20b4

more

961546a

organize

31fbafc

further

0f3ac94

format

2d457ae

ns

ca0fd6a

Merge branch 'master' into gate_bench_ci

ce7fa78

lic

88ffbae

Merge branch 'gate_bench_ci' of https://github.com/XanaduAI/pennylane…

2ff0805

…-lightning into gate_bench_ci

int labels

15013be

antalszava marked this pull request as ready for review March 12, 2021 20:53

antalszava requested a review from trbromley March 12, 2021 20:54

antalszava changed the title ~~Gate bench ci~~ Gate benchmarking CI Mar 12, 2021

antalszava mentioned this pull request Mar 12, 2021

Add implementations for optimised gate kernels #85

Merged

antalszava added 8 commits March 16, 2021 03:00

runs-on; no push run

e1f3294

git checkout

041b413

no uses

ed1bd6f

Revert "no uses"

b125615

This reverts commit ed1bd6f.

Revert "git checkout"

0ca25d1

This reverts commit 041b413.

remove previous binaries and pip uninstall

2381c8f

no default.qubit bench (temp)

2741927

Revert "no default.qubit bench (temp)"

0b365d8

This reverts commit 2741927.

antalszava mentioned this pull request Mar 16, 2021

PauliX with swaps #84

Closed

trbromley reviewed Mar 16, 2021

View reviewed changes

antalszava and others added 3 commits March 16, 2021 15:17

Update .github/workflows/benchmarks/plot_results.py

9d42fe5

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

Update .github/workflows/benchmarks/plot_results.py

81e33bc

Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>

license

7cc9fb3

antalszava added 2 commits March 16, 2021 22:38

adjust licensing

a8be88e

runs-on

7ca605b

trbromley approved these changes Mar 17, 2021

View reviewed changes

antalszava merged commit fd41d8f into master Mar 17, 2021

antalszava deleted the gate_bench_ci branch March 17, 2021 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate benchmarking CI #87

Gate benchmarking CI #87

antalszava commented Mar 12, 2021 •

edited

Loading

codecov bot commented Mar 12, 2021 •

edited

Loading

ThomasLoke commented Mar 14, 2021

antalszava commented Mar 15, 2021

ThomasLoke commented Mar 15, 2021

antalszava commented Mar 16, 2021

trbromley left a comment

trbromley Mar 16, 2021

antalszava Mar 16, 2021

trbromley Mar 16, 2021

antalszava Mar 16, 2021

trbromley Mar 16, 2021

antalszava Mar 16, 2021

ThomasLoke commented Mar 16, 2021

antalszava commented Mar 16, 2021

trbromley left a comment

trbromley Mar 17, 2021

antalszava Mar 17, 2021

Gate benchmarking CI #87

Gate benchmarking CI #87

Conversation

antalszava commented Mar 12, 2021 • edited Loading

codecov bot commented Mar 12, 2021 • edited Loading

Codecov Report

ThomasLoke commented Mar 14, 2021

antalszava commented Mar 15, 2021

ThomasLoke commented Mar 15, 2021

antalszava commented Mar 16, 2021

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasLoke commented Mar 16, 2021

antalszava commented Mar 16, 2021

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antalszava commented Mar 12, 2021 •

edited

Loading

codecov bot commented Mar 12, 2021 •

edited

Loading