-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New kernel dispatch mechanism #291
Conversation
Hello. You may have forgotten to update the changelog!
|
Codecov Report
@@ Coverage Diff @@
## master #291 +/- ##
==========================================
+ Coverage 99.42% 99.47% +0.04%
==========================================
Files 30 36 +6
Lines 3300 3606 +306
==========================================
+ Hits 3281 3587 +306
Misses 19 19
Continue to review full report at Codecov.
|
Co-authored-by: Lee James O'Riordan <mlxd@users.noreply.github.com>
…e-lightning into new_kernel_dispatch
Hi @mlxd, thanks for the detailed (and prompt) reviews! As I also changed the name of workflows for Window wheels (added Python versions after I will also put the benchmark results when it is done. Thanks! |
Co-authored-by: Amintor Dusko <87949283+AmintorDusko@users.noreply.github.com>
…e-lightning into new_kernel_dispatch
Co-authored-by: Amintor Dusko <87949283+AmintorDusko@users.noreply.github.com>
…e-lightning into new_kernel_dispatch
While comparing benchmark results from this PR vs What I observed with GCC9 was that the PI kernel implementation of IsingXX outperforms the LM kernel for some number of qubits. This is the set-up I used in this PR, i.e., we use the PI kernel for IsingXX for some number of qubits. However, I found that this assumption is no more valid with GCC10. With GCC10, the LM kernel always outperforms the PI kernel (for all number of qubits and all gates). Below is the benchmark results comparing the performance of the IsingXX gate for different kernels/compilers. Consequently, this PR only will improve the performance with additional kernels. Fortunately, it does not decrease performance either in a benchmark (I will add a plot). |
I benchmarked
and
They just match within a certain error bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @chaeyeunpark
I'd recommend adding an entry to the changelog as the last step. I will approve this in the meantime, so feel free to merge when this is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @chaeyeunpark! Looking forward to seeing this new dispatch mechanism merge soon. 🚀
Thanks, @maliasadi, @mlxd, @AmintorDusko for your careful reviews! |
🎉 |
Before submitting
Please complete the following checklist when submitting a PR:
All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests
directory!All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running
make docs
.Ensure that the test suite passes, by running
make test
.Add a new entry to the
.github/CHANGELOG.md
file, summarizing thechange, and including a link back to the PR.
Ensure that code is properly formatted by running
make format
.When all the above are checked, delete everything above the dashed
line and fill in the pull request template.
Context: Currently, we have two different gate implementations (kernels) and some more to be added. As it is possible that one kernel outperforms other kernels only for particular numbers of qubits, it is desirable to have a kernel dispatcher mechanism that takes this information into account. In addition, as with AVX kernels, it is necessary to have an array whose memory is managed by lightning with a proper alignment.
Description of the Change: Lightning now has the KernelMap class which returns the kernels to use for a given number of qubits. The statevector classes now manage their own kernel information accordingly. Moreover, custom
_asarray
is implemented.Benefits: Not huge right now, but with AVX and OpenMP kernels, we should have some advantages.
Possible Drawbacks: Slightly slower dispatch time (1 or 2 more std::unoreder_map accesses), and constructing a statevector becomes slower as it needs to obtain the best kernels for each operation.
Related GitHub Issues:
Note that I have added
StateVectorCPU
class as a parent class ofStateVectorManaged
andStateVectorRaw
, which are now renamed toStateVectorManagedPU
andStateVectorRawCPU
, respectively. This renaming resulted in lots of changed files.