New kernel dispatch mechanism #291

chaeyeunpark · 2022-05-07T21:19:32Z

Before submitting

Please complete the following checklist when submitting a PR:

All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests directory!
All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running make docs.
Ensure that the test suite passes, by running make test.
Add a new entry to the .github/CHANGELOG.md file, summarizing the
change, and including a link back to the PR.
Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.

Context: Currently, we have two different gate implementations (kernels) and some more to be added. As it is possible that one kernel outperforms other kernels only for particular numbers of qubits, it is desirable to have a kernel dispatcher mechanism that takes this information into account. In addition, as with AVX kernels, it is necessary to have an array whose memory is managed by lightning with a proper alignment.

Description of the Change: Lightning now has the KernelMap class which returns the kernels to use for a given number of qubits. The statevector classes now manage their own kernel information accordingly. Moreover, custom _asarray is implemented.

Benefits: Not huge right now, but with AVX and OpenMP kernels, we should have some advantages.

Possible Drawbacks: Slightly slower dispatch time (1 or 2 more std::unoreder_map accesses), and constructing a statevector becomes slower as it needs to obtain the best kernels for each operation.

Related GitHub Issues:

Note that I have added StateVectorCPU class as a parent class of StateVectorManaged and StateVectorRaw, which are now renamed to StateVectorManagedPU and StateVectorRawCPU, respectively. This renaming resulted in lots of changed files.

github-actions · 2022-05-07T21:19:48Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

github-actions · 2022-05-07T21:23:29Z

Test Report (C++) on Ubuntu

          1 files ±    0           1 suites ±0 1s ⏱️ -1s
      884 tests +  22       884 ✔️ +  22 0 💤 ±0 0 ❌ ±0
229 146 runs +442 229 146 ✔️ +442 0 💤 ±0 0 ❌ ±0

Results for commit 5720f16. ± Comparison against base commit e182d63.

♻️ This comment has been updated with latest results.

codecov · 2022-05-09T14:36:24Z

Codecov Report

Merging #291 (5720f16) into master (e182d63) will increase coverage by 0.04%.
The diff coverage is 99.46%.

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
+ Coverage   99.42%   99.47%   +0.04%     
==========================================
  Files          30       36       +6     
  Lines        3300     3606     +306     
==========================================
+ Hits         3281     3587     +306     
  Misses         19       19

Impacted Files	Coverage Δ
...ennylane_lightning/src/gates/OpToMemberFuncPtr.hpp	`100.00% <ø> (ø)`
...ng/src/gates/cpu_kernels/GateImplementationsLM.hpp	`100.00% <ø> (ø)`
...lightning/src/gates/cpu_kernels/PauliGenerator.hpp	`100.00% <ø> (ø)`
pennylane_lightning/src/util/ConstantUtil.hpp	`100.00% <ø> (ø)`
pennylane_lightning/src/util/LinearAlgebra.hpp	`97.45% <ø> (ø)`
pennylane_lightning/src/util/Util.hpp	`100.00% <ø> (ø)`
...nnylane_lightning/src/simulator/CPUMemoryModel.hpp	`96.15% <96.15%> (ø)`
pennylane_lightning/src/simulator/KernelMap.hpp	`98.98% <98.98%> (ø)`
pennylane_lightning/_version.py	`100.00% <100.00%> (ø)`
pennylane_lightning/lightning_qubit.py	`100.00% <100.00%> (ø)`
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e182d63...5720f16. Read the comment docs.

Co-authored-by: Lee James O'Riordan <mlxd@users.noreply.github.com>

…e-lightning into new_kernel_dispatch

chaeyeunpark · 2022-06-01T14:52:49Z

Hi @mlxd, thanks for the detailed (and prompt) reviews! As I also changed the name of workflows for Window wheels (added Python versions after windows-latest), we also need to update the names of the required workflows.

I will also put the benchmark results when it is done. Thanks!

pennylane_lightning/src/tests/TestKernels.hpp

pennylane_lightning/src/tests/Test_AdjDiff.cpp

pennylane_lightning/src/tests/Test_KernelMap.cpp

pennylane_lightning/src/tests/Test_Measures_Sparse.cpp

pennylane_lightning/src/util/BitUtil.hpp

pennylane_lightning/src/util/IntegerInterval.hpp

Co-authored-by: Amintor Dusko <87949283+AmintorDusko@users.noreply.github.com>

…e-lightning into new_kernel_dispatch

Co-authored-by: Amintor Dusko <87949283+AmintorDusko@users.noreply.github.com>

…e-lightning into new_kernel_dispatch

chaeyeunpark · 2022-06-02T15:47:32Z

While comparing benchmark results from this PR vs master, I found an effect from a recent compiler update.

What I observed with GCC9 was that the PI kernel implementation of IsingXX outperforms the LM kernel for some number of qubits. This is the set-up I used in this PR, i.e., we use the PI kernel for IsingXX for some number of qubits. However, I found that this assumption is no more valid with GCC10. With GCC10, the LM kernel always outperforms the PI kernel (for all number of qubits and all gates). Below is the benchmark results comparing the performance of the IsingXX gate for different kernels/compilers.

Consequently, this PR only will improve the performance with additional kernels. Fortunately, it does not decrease performance either in a benchmark (I will add a plot).

chaeyeunpark · 2022-06-02T19:06:37Z

I benchmarked master and PR with PL (thus with a Python layer). This PR:

RX	4	0.0036217915109991736
RX	6	0.0038707129000013084
RX	8	0.0041500326710011
RX	10	0.00426488146200063
RX	12	0.004521035879000919
RX	14	0.005227743032000944
RX	16	0.01215034226899843
RX	18	0.034738253583000187
RX	20	0.1713370519399905
RX	22	0.7365909335600008
RX	24	2.8554877892199872
IsingXX	4	0.002746432375999575
IsingXX	6	0.00279603455600045
IsingXX	8	0.0029767005180001433
IsingXX	10	0.002908858983000755
IsingXX	12	0.0032211705299996537
IsingXX	14	0.004522534466999787
IsingXX	16	0.009849675216000832
IsingXX	18	0.029261140585998872
IsingXX	20	0.16302959946000556
IsingXX	22	0.6995188719600083
IsingXX	24	2.700770096470005

and master:

RX	4	0.004058633579999878
RX	6	0.005088524271001006
RX	8	0.0041578931450003435
RX	10	0.0045033177889999935
RX	12	0.004255916452999373
RX	14	0.005389769843000977
RX	16	0.012636205329001313
RX	18	0.035716377094000565
RX	20	0.18188403280000784
RX	22	0.7846357532299953
RX	24	2.9888012942800013
IsingXX	4	0.0027232625420001567
IsingXX	6	0.0027902913140005693
IsingXX	8	0.0028135284099989805
IsingXX	10	0.002900960183000279
IsingXX	12	0.0032205224449990055
IsingXX	14	0.004480759730999125
IsingXX	16	0.010184240028000204
IsingXX	18	0.030108122703000846
IsingXX	20	0.16488404629999423
IsingXX	22	0.7350444105300085
IsingXX	24	2.790163356810008

They just match within a certain error bound.

mlxd

Awesome work @chaeyeunpark
I'd recommend adding an entry to the changelog as the last step. I will approve this in the meantime, so feel free to merge when this is done.

maliasadi

Great work @chaeyeunpark! Looking forward to seeing this new dispatch mechanism merge soon. 🚀

chaeyeunpark · 2022-06-06T13:55:14Z

Thanks, @maliasadi, @mlxd, @AmintorDusko for your careful reviews!

co9olguy · 2022-06-06T15:27:44Z

🎉

New branch from master

2fe2e0c

chaeyeunpark mentioned this pull request May 7, 2022

Better kernel dispatch and add custom _asarray #245

Closed

5 tasks

chaeyeunpark added 3 commits May 7, 2022 18:32

Add tests

13a9f29

Fix Python test

c81bc30

Fix test

133499e

chaeyeunpark marked this pull request as ready for review May 8, 2022 13:10

chaeyeunpark and others added 7 commits May 9, 2022 12:38

Update coverage

f1d89a1

Fix binding dependency

7ad725d

Fix GBenchmark

e5c62cf

Revert StateVector tests to origin

8667186

Add more tests

063cd34

Auto update version

4302124

Merge remote-tracking branch 'origin/master' into new_kernel_dispatch

f51e904

chaeyeunpark marked this pull request as draft May 10, 2022 01:16

chaeyeunpark added 4 commits May 9, 2022 21:16

Fix for tidy

5cbb9a0

Update KernelMap

f3cdbac

Fix for codefactor

4faf279

Merge remote-tracking branch 'origin/master' into new_kernel_dispatch

61726c3

chaeyeunpark marked this pull request as ready for review May 10, 2022 16:27

chaeyeunpark and others added 6 commits May 10, 2022 16:27

Auto update version

ebdf601

Trigger CI

09d52fa

Update Memory

1428195

Fix

39aebb9

Revert single file

bd6b704

Fix for tidy

063b379

chaeyeunpark requested review from mlxd and maliasadi May 10, 2022 19:30

chaeyeunpark and others added 4 commits June 1, 2022 03:09

Revert to posix_malloc in MacOS

b740869

Update pennylane_lightning/src/simulator/KernelMap.hpp

59a5613

Co-authored-by: Lee James O'Riordan <mlxd@users.noreply.github.com>

Update docstring

b745ba9

Merge branch 'new_kernel_dispatch' of github.com:PennyLaneAI/pennylan…

bb7c64d

…e-lightning into new_kernel_dispatch