[Regression] Restore good performance when requesting many individual shots #6079

cvjjm · 2024-08-07T15:56:40Z

Feature details

Back in the day... pennylane v0.29.1 had really good performance when requesting many individual samples from devices.

With v0.29.1, the sample code below takes only about twice as long to generate individual samples than it takes to compute shot noisy expectation values over the same number of shots (0.087s vs. 0.137s with default.qubit and ok 0.071s vs. 0.118s with lightning.qubit).

With 0.35.1 (and various other more recent versions that I have tested), it takes about 100x as longer to output individual shots than it takes to compute the expectation values (0.15s vs. 14.15s).

With lightning.qubit under v0.35.1 the difference is not quite as stark, but the second evaluation still takes about 16x longer than the first (0.076s vs. 1.27s).

So, even when using lightning.qubit under both versions, the time for expectation value computation stays nearly the same (0.071s vs. 0.076s) but the individual shot sampling time increased by more than a factor 10x from 0.118s in v0.29.1 to 1.27s in v0.35.1!

It would be great if good performance could be restored so I can use reasonably recent versions of PennyLane when I need realistic simulations of individual shots.

Implementation

import time
import pennylane as qml

sampling_dev1 = qml.device(
    #"lightning.qubit",
    "default.qubit",
    wires=range(3),
    shots=1000,
)

sampling_dev2 = qml.device(
    #"lightning.qubit",
    "default.qubit",
    wires=range(3),
    shots=[1] * 1000,  # performance is bad with 0.35.1 if I request individual shots here
)


def ansatz():
    qml.PauliX(wires=[0])
    qml.SingleExcitation(-0.44, wires=[0, 1])
    qml.RY(1.57, wires=2)
    qml.RZ(1.57, wires=2)
    qml.CRZ(1.57, wires=[2, 1])
    qml.adjoint(qml.SingleExcitation)(-0.44, wires=[0, 1])
    qml.adjoint(qml.PauliX)(wires=[0])
    qml.RX(1.57, wires=2)

@qml.qnode(sampling_dev1)
def qnode1():
    ansatz()
    return [qml.expval(qml.PauliZ(wire)) for wire in range(3)]

t0 = time.time()
[qnode1() for _ in range(100)]
print(f"evaluating 100 expectation values over 1000 shots took {time.time() - t0}s")

@qml.qnode(sampling_dev2)
def qnode2():
    ansatz()
    return [qml.expval(qml.PauliZ(wire)) for wire in range(3)]

t0 = time.time()
[qnode2() for _ in range(100)]
print(f"evaluating 100 times 1000 individual shots took {time.time() - t0}s")

How important would you say this feature is?

2: Somewhat important. Needed this quarter.

Additional information

No response

The text was updated successfully, but these errors were encountered:

josh146 · 2024-08-07T20:01:54Z

Thanks @cvjjm! This is on our radar, and I have good news and not-so-good news.

The good news is that we have found and fixed this performance regression when using shot-vectors

In lightning.qubit: Shot batching is made more efficient by executing all the shots in one go on Lightning Qubit pennylane-lightning#814 (two weeks ago, so will be in the Sept release)
In default.qubit: Add shots.bins() generator method #5476 (April, so will be in 0.36)

The not-so-good news is that there may be additional performance regressions we need to investigate in addition to the above, so will get back to you with more information when I have it.

cvjjm added the enhancement ✨ New feature or request label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] Restore good performance when requesting many individual shots #6079

[Regression] Restore good performance when requesting many individual shots #6079

cvjjm commented Aug 7, 2024

josh146 commented Aug 7, 2024

[Regression] Restore good performance when requesting many individual shots #6079

[Regression] Restore good performance when requesting many individual shots #6079

Comments

cvjjm commented Aug 7, 2024

Feature details

Implementation

How important would you say this feature is?

Additional information

josh146 commented Aug 7, 2024