Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New kernel dispatch mechanism #291

Merged
merged 59 commits into from
Jun 6, 2022
Merged

New kernel dispatch mechanism #291

merged 59 commits into from
Jun 6, 2022

Conversation

chaeyeunpark
Copy link
Contributor

@chaeyeunpark chaeyeunpark commented May 7, 2022

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context: Currently, we have two different gate implementations (kernels) and some more to be added. As it is possible that one kernel outperforms other kernels only for particular numbers of qubits, it is desirable to have a kernel dispatcher mechanism that takes this information into account. In addition, as with AVX kernels, it is necessary to have an array whose memory is managed by lightning with a proper alignment.

Description of the Change: Lightning now has the KernelMap class which returns the kernels to use for a given number of qubits. The statevector classes now manage their own kernel information accordingly. Moreover, custom _asarray is implemented.

Benefits: Not huge right now, but with AVX and OpenMP kernels, we should have some advantages.

Possible Drawbacks: Slightly slower dispatch time (1 or 2 more std::unoreder_map accesses), and constructing a statevector becomes slower as it needs to obtain the best kernels for each operation.

Related GitHub Issues:

Note that I have added StateVectorCPU class as a parent class of StateVectorManaged and StateVectorRaw, which are now renamed to StateVectorManagedPU and StateVectorRawCPU, respectively. This renaming resulted in lots of changed files.

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2022

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2022

Test Report (C++) on Ubuntu

           1 files  ±    0             1 suites  ±0   1s ⏱️ -1s
       884 tests +  22         884 ✔️ +  22  0 💤 ±0  0 ±0 
229 146 runs  +442  229 146 ✔️ +442  0 💤 ±0  0 ±0 

Results for commit 5720f16. ± Comparison against base commit e182d63.

♻️ This comment has been updated with latest results.

@chaeyeunpark chaeyeunpark marked this pull request as ready for review May 8, 2022 13:10
@codecov
Copy link

codecov bot commented May 9, 2022

Codecov Report

Merging #291 (5720f16) into master (e182d63) will increase coverage by 0.04%.
The diff coverage is 99.46%.

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
+ Coverage   99.42%   99.47%   +0.04%     
==========================================
  Files          30       36       +6     
  Lines        3300     3606     +306     
==========================================
+ Hits         3281     3587     +306     
  Misses         19       19              
Impacted Files Coverage Δ
...ennylane_lightning/src/gates/OpToMemberFuncPtr.hpp 100.00% <ø> (ø)
...ng/src/gates/cpu_kernels/GateImplementationsLM.hpp 100.00% <ø> (ø)
...lightning/src/gates/cpu_kernels/PauliGenerator.hpp 100.00% <ø> (ø)
pennylane_lightning/src/util/ConstantUtil.hpp 100.00% <ø> (ø)
pennylane_lightning/src/util/LinearAlgebra.hpp 97.45% <ø> (ø)
pennylane_lightning/src/util/Util.hpp 100.00% <ø> (ø)
...nnylane_lightning/src/simulator/CPUMemoryModel.hpp 96.15% <96.15%> (ø)
pennylane_lightning/src/simulator/KernelMap.hpp 98.98% <98.98%> (ø)
pennylane_lightning/_version.py 100.00% <100.00%> (ø)
pennylane_lightning/lightning_qubit.py 100.00% <100.00%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e182d63...5720f16. Read the comment docs.

@chaeyeunpark chaeyeunpark marked this pull request as draft May 10, 2022 01:16
@chaeyeunpark chaeyeunpark marked this pull request as ready for review May 10, 2022 16:27
@chaeyeunpark chaeyeunpark requested review from mlxd and maliasadi May 10, 2022 19:30
@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Jun 1, 2022

Hi @mlxd, thanks for the detailed (and prompt) reviews! As I also changed the name of workflows for Window wheels (added Python versions after windows-latest), we also need to update the names of the required workflows.

I will also put the benchmark results when it is done. Thanks!

@chaeyeunpark
Copy link
Contributor Author

chaeyeunpark commented Jun 2, 2022

While comparing benchmark results from this PR vs master, I found an effect from a recent compiler update.

What I observed with GCC9 was that the PI kernel implementation of IsingXX outperforms the LM kernel for some number of qubits. This is the set-up I used in this PR, i.e., we use the PI kernel for IsingXX for some number of qubits. However, I found that this assumption is no more valid with GCC10. With GCC10, the LM kernel always outperforms the PI kernel (for all number of qubits and all gates). Below is the benchmark results comparing the performance of the IsingXX gate for different kernels/compilers.

compare-compilers

Consequently, this PR only will improve the performance with additional kernels. Fortunately, it does not decrease performance either in a benchmark (I will add a plot).

@chaeyeunpark
Copy link
Contributor Author

I benchmarked master and PR with PL (thus with a Python layer). This PR:

RX	4	0.0036217915109991736
RX	6	0.0038707129000013084
RX	8	0.0041500326710011
RX	10	0.00426488146200063
RX	12	0.004521035879000919
RX	14	0.005227743032000944
RX	16	0.01215034226899843
RX	18	0.034738253583000187
RX	20	0.1713370519399905
RX	22	0.7365909335600008
RX	24	2.8554877892199872
IsingXX	4	0.002746432375999575
IsingXX	6	0.00279603455600045
IsingXX	8	0.0029767005180001433
IsingXX	10	0.002908858983000755
IsingXX	12	0.0032211705299996537
IsingXX	14	0.004522534466999787
IsingXX	16	0.009849675216000832
IsingXX	18	0.029261140585998872
IsingXX	20	0.16302959946000556
IsingXX	22	0.6995188719600083
IsingXX	24	2.700770096470005

and master:

RX	4	0.004058633579999878
RX	6	0.005088524271001006
RX	8	0.0041578931450003435
RX	10	0.0045033177889999935
RX	12	0.004255916452999373
RX	14	0.005389769843000977
RX	16	0.012636205329001313
RX	18	0.035716377094000565
RX	20	0.18188403280000784
RX	22	0.7846357532299953
RX	24	2.9888012942800013
IsingXX	4	0.0027232625420001567
IsingXX	6	0.0027902913140005693
IsingXX	8	0.0028135284099989805
IsingXX	10	0.002900960183000279
IsingXX	12	0.0032205224449990055
IsingXX	14	0.004480759730999125
IsingXX	16	0.010184240028000204
IsingXX	18	0.030108122703000846
IsingXX	20	0.16488404629999423
IsingXX	22	0.7350444105300085
IsingXX	24	2.790163356810008

They just match within a certain error bound.

@chaeyeunpark chaeyeunpark requested a review from mlxd June 2, 2022 19:07
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @chaeyeunpark
I'd recommend adding an entry to the changelog as the last step. I will approve this in the meantime, so feel free to merge when this is done.

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @chaeyeunpark! Looking forward to seeing this new dispatch mechanism merge soon. 🚀

@chaeyeunpark
Copy link
Contributor Author

Thanks, @maliasadi, @mlxd, @AmintorDusko for your careful reviews!

@chaeyeunpark chaeyeunpark merged commit 4f969e4 into master Jun 6, 2022
@chaeyeunpark chaeyeunpark deleted the new_kernel_dispatch branch June 6, 2022 14:26
@co9olguy
Copy link
Member

co9olguy commented Jun 6, 2022

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants