Add Lightning GBenchmark Suite #249

maliasadi · 2022-03-08T00:37:57Z

Lightning GBenchmark Suite

This PR adds the PennyLane-Lightning benchmark suite powered by google-benchmark (GB). To use GB scripts, one can perform make gbenchmark or run

$ cmake pennylane_lightning/src/ -BBuildGBench -DBUILD_BENCHMARKS=ON -DENABLE_OPENMP=ON -DENABLE_BLAS=ON -DCMAKE_BUILD_TYPE=Release
$ cmake --build ./BuildGBench --target utils apply_operations apply_multirz

The main requirement for these scripts is google-benchmark. We use the CMake FetchContent command to install the library if the find_package command fails to find GB.

Implementation details

check the README file.

github-actions · 2022-03-08T00:38:16Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

codecov · 2022-03-08T00:41:35Z

Codecov Report

Merging #249 (0a566ee) into master (27bc5f5) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #249   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          366       366           
=========================================
  Hits           366       366

Impacted Files	Coverage Δ
pennylane_lightning/_version.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27bc5f5...0a566ee. Read the comment docs.

…nto add_gb_utils

github-actions · 2022-03-09T06:28:22Z

Test Report (C++) on Ubuntu

      1 files ±0       1 suites ±0 0s ⏱️ ±0s
  555 tests ±0   555 ✔️ ±0 0 💤 ±0 0 ❌ ±0
2 289 runs ±0 2 289 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 0a566ee. ± Comparison against base commit 27bc5f5.

♻️ This comment has been updated with latest results.

…nto add_gb_utils

mlxd

Nice work @maliasadi
Nothing major to add here, only we can await the 0.23 tag to merge first.

A few quick comments too, but happy to go with whatever you think.

mlxd · 2022-03-14T13:50:09Z

pennylane_lightning/src/util/LinearAlgebra.hpp

@@ -369,7 +369,7 @@ inline auto matrixVecProd(const std::vector<std::complex<T>> mat,
 * @param n1 Index of the first column.
 * @param n2 Index of the last column.
 */
-template <class T, size_t BLOCKSIZE = 32> // NOLINT(readability-magic-numbers)
+template <class T, size_t BLOCKSIZE = 16> // NOLINT(readability-magic-numbers)


Does 16 offer better performance for this?

Yes indeed! This is the result of running the following command:

$ python3 compare.py filters ./BuildGBench/benchmarks/utils "cf_transpose_cmplx<double, 16>" "cf_transpose_cmplx<double, 32>"

Running ./BuildGBench/benchmarks/utils Run on (8 X 3877.22 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) Load Average: 3.70, 2.74, 1.44 ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ cf_transpose_cmplx<double, 16>/32 769 ns 769 ns 905299 cf_transpose_cmplx<double, 16>/64 3867 ns 3867 ns 181881 cf_transpose_cmplx<double, 16>/128 18741 ns 18741 ns 43778 cf_transpose_cmplx<double, 16>/256 223272 ns 223266 ns 3133 cf_transpose_cmplx<double, 16>/512 1028820 ns 1028753 ns 682 cf_transpose_cmplx<double, 16>/1024 5229414 ns 5229264 ns 129 cf_transpose_cmplx<double, 16>/2048 40673714 ns 40666706 ns 17 cf_transpose_cmplx<double, 16>/4096 165500143 ns 165467574 ns 4 cf_transpose_cmplx<double, 16>/8192 626944729 ns 626880717 ns 1 RUNNING: ./BuildGBench/benchmarks/utils --benchmark_filter=cf_transpose_cmplx<double, 32> --benchmark_out=/tmp/tmp_x_2ganq 2022-03-15T01:40:42-04:00 Running ./BuildGBench/benchmarks/utils Run on (8 X 2341.99 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) Load Average: 3.10, 2.66, 1.43 ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ cf_transpose_cmplx<double, 32>/32 1067 ns 1067 ns 670864 cf_transpose_cmplx<double, 32>/64 5940 ns 5940 ns 106912 cf_transpose_cmplx<double, 32>/128 23607 ns 23607 ns 29433 cf_transpose_cmplx<double, 32>/256 226348 ns 226345 ns 2899 cf_transpose_cmplx<double, 32>/512 999435 ns 999460 ns 668 cf_transpose_cmplx<double, 32>/1024 5251783 ns 5250924 ns 128 cf_transpose_cmplx<double, 32>/2048 39253467 ns 39204690 ns 17 cf_transpose_cmplx<double, 32>/4096 169543968 ns 169549855 ns 4 cf_transpose_cmplx<double, 32>/8192 639742739 ns 639680299 ns 1 Comparing cf_transpose_cmplx<double, 16> to cf_transpose_cmplx<double, 32> (from ./BuildGBench/benchmarks/utils) Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/32 +0.3870 +0.3870 769 1067 769 1067 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/64 +0.5363 +0.5363 3867 5940 3867 5940 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/128 +0.2596 +0.2597 18741 23607 18741 23607 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/256 +0.0138 +0.0138 223272 226348 223266 226345 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/512 -0.0286 -0.0285 1028820 999435 1028753 999460 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/1024 +0.0043 +0.0041 5229414 5251783 5229264 5250924 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/2048 -0.0349 -0.0360 40673714 39253467 40666706 39204690 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/4096 +0.0244 +0.0247 165500143 169543968 165467574 169549855 [cf_transpose_cmplx<double, 16> vs. cf_transpose_cmplx<double, 32>]/8192 +0.0204 +0.0204 626944729 639742739 626880717 639680299 OVERALL_GEOMEAN +0.1156 +0.1155 0 0 0 0

Interesting. Would the same gain be seen on different CPUs do you think?

We should probably run this on multiple types of processors to see how this works (I can try a Ryzen tomorrow to see how that fares).

This is a good idea. FYI, BLOCKSIZE=2^n performs transposition over submatrices of size 2^n * 2^n of the original matrix and the performance we gain using this blocking technique should come from the size of cache and the number of cache misses. On my machines with the following cache info, transposing submatrices of size 2^8 is more cache-friendly than transposing submatrices of size 2^10,

CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1)

pennylane_lightning/src/benchmarks/README.md

pennylane_lightning/src/benchmarks/Bench_ApplyOperations.cpp

pennylane_lightning/_version.py

chaeyeunpark

Hi @maliasadi, thanks for adding benchmark for gate operations! As I was also adding benchmark for generators and matrix operations (part of #245), I may add a subsequent PR after changing the code to use googlebenchmark. Some suggestions are also listed below:

pennylane_lightning/src/benchmarks/Bench_ApplyOperations.cpp

pennylane_lightning/src/benchmarks/Bench_ApplyMultiRZ.cpp

pennylane_lightning/src/benchmarks/Bench_ApplyOperations.cpp

pennylane_lightning/src/benchmarks/CMakeLists.txt

trevor-vincent

Really awesome Ali. I think I can directly import this into a benchmark website for it. I'm guessing this might actually work with lightning-gpu too right?

mlxd

A few more comments.

pennylane_lightning/src/benchmarks/Bench_ApplyMultiRZ.cpp

pennylane_lightning/src/benchmarks/Bench_ApplyOperations.cpp

pennylane_lightning/src/benchmarks/Bench_LinearAlgebra.cpp

Co-authored-by: Chae-Yeun Park <chae-yeun@xanadu.ai>

mlxd

Nothing more to add from my side. Thanks @maliasadi

maliasadi · 2022-03-15T19:49:16Z

Thanks you @chaeyeunpark for reviewing this PR. I added benchmarks for gate operations and leave cleaning ./src/examples to you as this is mostly your playground and don't want to remove anything useful unintentionally there. Feel free to update ./src/benchmarks afterwards. This PR is basically the first version of the GBenchmark suite in Lightning.

maliasadi · 2022-03-15T19:57:21Z

Really awesome Ali. I think I can directly import this into a benchmark website for it. I'm guessing this might actually work with lightning-gpu too right?

Thank you @trevor-vincent! Yes, it should work with LightningGPU too. I believe there will be more GB scripts benchmarking different methods, kernels and devices 🚀

chaeyeunpark

Looks great to me! Thanks for the nice work again @maliasadi. As you mentioned, I will make a subsequent PR on benchmarking all gates/generators/matrix operations. Hopefully, we can benchmark all different kernels with a single command.

chaeyeunpark · 2022-03-15T20:21:42Z

.github/workflows/tests.yml

@@ -55,6 +55,49 @@ jobs:
          check_name: Test Report (C++) on Ubuntu
          files: Build/tests/results/report.xml

+  cpptestswithblas:


maliasadi and others added 11 commits March 6, 2022 23:12

Add GBenchmark to CMake

3aa4e1f

Add benchmark dir

aaea1a0

Add Bench_LinearAlgebra

2b942f7

Update Makefile

cdbe9ca

Update

c35852a

Update Bench_LinearAlgebra

ae667fe

Update Bench*

fafea43

Add Bench_Gate

b6dc851

Update CMakeFiles

dc9fd11

Remove Bench_Gates*

5818cbc

Auto update version

b6465a3

maliasadi and others added 7 commits March 9, 2022 01:17

Merge branch 'master' of github.com:PennyLaneAI/pennylane-lightning i…

b1c0d8c

…nto add_gb_utils

Update

742152b

Fix version conflict

f58a994

Auto update version

46e2fa7

Update

6affba3

Fix version conflict againgit pull origin add_gb_utils

9a00441

Update

2d46bbb

maliasadi and others added 9 commits March 10, 2022 13:15

Add Bench_ApplyOperations

5cdbb25

Merge branch 'master' of github.com:PennyLaneAI/pennylane-lightning i…

b72e7b5

…nto add_gb_utils

Auto update version

1a46824

Add Bench_ApplyHCNOT

90b75ce

Update formatting

2e0ca9c

Update Bench*

9a632f1

Add Bench_ApplyMultiRZ

228c646

Add README

0b2780f

Update README

6e5c9f2

maliasadi requested review from mlxd and trevor-vincent March 11, 2022 18:57

mlxd reviewed Mar 14, 2022

View reviewed changes

chaeyeunpark suggested changes Mar 14, 2022

View reviewed changes

trevor-vincent approved these changes Mar 14, 2022

View reviewed changes

mlxd reviewed Mar 14, 2022

View reviewed changes

pennylane_lightning/src/benchmarks/Bench_ApplyMultiRZ.cpp Outdated Show resolved Hide resolved

pennylane_lightning/src/benchmarks/Bench_ApplyOperations.cpp Outdated Show resolved Hide resolved

pennylane_lightning/src/benchmarks/Bench_LinearAlgebra.cpp Show resolved Hide resolved

maliasadi and others added 11 commits March 14, 2022 19:22

Update Bench_*

d0f4f62

Apply suggestions from code review

18d6448

Co-authored-by: Chae-Yeun Park <chae-yeun@xanadu.ai>

Add Bench_Utils

0595cc2

Update Bench_Utils

4f66581

Add CI test with ENABLE_BLAS

397c035

Fix conflict

d3f4541

Auto update version

f59d0e8

Update tests.yml

019f03d

Update tests

f4408e4

Update README

a53c69a

Update tests.yml

a51f35b

mlxd approved these changes Mar 15, 2022

View reviewed changes

maliasadi requested a review from chaeyeunpark March 15, 2022 19:49

chaeyeunpark approved these changes Mar 15, 2022

View reviewed changes

maliasadi and others added 6 commits March 17, 2022 10:46

Fix version conflict

e946233

Update version

6e0c4dd

Update #249

cb6dd3a

Update

b73ea8c

Update Bench_ApplyOps

8d4dbea

Auto update version

0a566ee

maliasadi merged commit e563736 into master Mar 17, 2022

maliasadi deleted the add_gb_utils branch March 17, 2022 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lightning GBenchmark Suite #249

Add Lightning GBenchmark Suite #249

maliasadi commented Mar 8, 2022 •

edited

Loading

github-actions bot commented Mar 8, 2022

codecov bot commented Mar 8, 2022 •

edited

Loading

github-actions bot commented Mar 9, 2022 •

edited

Loading

mlxd left a comment

mlxd Mar 14, 2022

maliasadi Mar 15, 2022

mlxd Mar 15, 2022

maliasadi Mar 15, 2022

chaeyeunpark left a comment

trevor-vincent left a comment

mlxd left a comment

mlxd left a comment

maliasadi commented Mar 15, 2022

maliasadi commented Mar 15, 2022

chaeyeunpark left a comment

chaeyeunpark Mar 15, 2022

Add Lightning GBenchmark Suite #249

Add Lightning GBenchmark Suite #249

Conversation

maliasadi commented Mar 8, 2022 • edited Loading

Lightning GBenchmark Suite

Implementation details

github-actions bot commented Mar 8, 2022

codecov bot commented Mar 8, 2022 • edited Loading

Codecov Report

github-actions bot commented Mar 9, 2022 • edited Loading

Test Report (C++) on Ubuntu

mlxd left a comment

Choose a reason for hiding this comment

mlxd Mar 14, 2022

Choose a reason for hiding this comment

maliasadi Mar 15, 2022

Choose a reason for hiding this comment

mlxd Mar 15, 2022

Choose a reason for hiding this comment

maliasadi Mar 15, 2022

Choose a reason for hiding this comment

chaeyeunpark left a comment

Choose a reason for hiding this comment

trevor-vincent left a comment

Choose a reason for hiding this comment

mlxd left a comment

Choose a reason for hiding this comment

mlxd left a comment

Choose a reason for hiding this comment

maliasadi commented Mar 15, 2022

maliasadi commented Mar 15, 2022

chaeyeunpark left a comment

Choose a reason for hiding this comment

chaeyeunpark Mar 15, 2022

Choose a reason for hiding this comment

maliasadi commented Mar 8, 2022 •

edited

Loading

codecov bot commented Mar 8, 2022 •

edited

Loading

github-actions bot commented Mar 9, 2022 •

edited

Loading