Benchmark test case for q4_0 matrix multiplication #653

SebastianApel · 2023-03-31T14:52:06Z

This is a small program I used benchmark the performance of the q4_0 matrix multiplication core.

I've used it to compare several implementations of ggml_vec_dot_q4_0.

I will reference this benchmark in another pull request for an improved AVX2 ggml_vec_dot_q4_0 implementation.

tests/test-benchmark-q4_0-matmult.c

…_us()

tests/test-benchmark-q4_0-matmult.c

slaren · 2023-04-01T18:53:37Z

Looks good, but I am wondering if this should be split in two parts. On one hand we have a correctness test, which should be added to the list of tests to run in the CI with CTest, and on the other hand we have a benchmarking tool that is meant to be run manually, and should be somewhere else, maybe as one of the examples. What do you think?

…hreads

SebastianApel · 2023-04-01T19:21:14Z

Looks good, but I am wondering if this should be split in two parts. On one hand we have a correctness test, which should be added to the list of tests to run in the CI with CTest, and on the other hand we have a benchmarking tool that is meant to be run manually, and should be somewhere else, maybe as one of the examples. What do you think?

@slaren I've moved the benchmarking to the folder "examples/benchmark" and adjusted the Makefile. Please note that I've not changed the CMakelists.

I also agree conceptually with your suggestion around the correctness test, but I am not familiar with setting up CTests so it would be great if someone else could do that (maybe in a different PR).

ggerganov

It would be useful to have a mat mul benchmark tool, but I think we need more work to make this better:

Add options to benchmark all data types, not just Q4_0
Add lightweight correctness tests to CTest / CI
I think no need for separate "example" and "test". A single "test" in the "./tests" folder that can be executed with different arguments should be best

I think this benchmark will eventually become part of the ggml repo.
For example, in whisper.cpp we already have similar benchmarks:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L4686-L4774

So, if we add it here too, we will have duplication.

If we address the points above, we can merge the benchmark tool in this repo first and later move it to ggml.

Makefile

SebastianApel · 2023-04-02T19:24:08Z

@slaren & @ggerganov I need you help with the questions around "examples" vs "tests".

@slaren suggested

maybe as one of the examples.

@ggerganov suggested

I think no need for separate "example" and "test". A single "test" in the "./tests" folder that can be executed with different arguments should be best

Both options are fine with me - but I cannot fulfill both wishes. Who will decide and let me know which option should be implemented in order to get this PR merged?

SebastianApel · 2023-04-02T19:31:18Z

@ggerganov

It would be useful to have a mat mul benchmark tool, but I think we need more work to make this better:

Add options to benchmark all data types, not just Q4_0

Add lightweight correctness tests to CTest / CI

I can understand the desire to extend the benchmark tool even more and enable it to work on all data types, and also to implement lightweight correctness tests to CTest / CI.

My challenge is that I will only have very limited capacity over the next weeks, so I myself will not be able to work on your suggestions.

I see the following options:

O1: We leave this PR open until someone finds the time to work on the suggestions
O2: We merge the PR as is and open 1 or 2 "improvement" issue tickets for the suggestions
O3: We close the PR and when someone finds the time to work on the suggestions, we will open a new one

Do you see any additional options? If not, which option would you prefer?

slaren · 2023-04-02T19:44:14Z

@SebastianApel I am sorry for the confusion, I was mostly looking for input since I wasn't sure myself what was the best way to proceed. Absolutely follow @ggerganov suggestions, this is his project after all.

SebastianApel · 2023-04-02T20:18:38Z

@SebastianApel I am sorry for the confusion, I was mostly looking for input since I wasn't sure myself what was the best way to proceed. Absolutely follow @ggerganov suggestions, this is his project after all.

@slaren Thank you for the quick feedback. I can move it back, no problem. I'll wait for feedback from @ggerganov regarding #653 (comment) before doing so.

ggerganov · 2023-04-13T12:46:01Z

Adding this now, but I think we can do way better and later move this as a ggml test

jon-chuang · 2023-04-14T17:31:02Z

Quite curious, perhaps add the graph_print after to show the individual node perf timings? (Requires GGML_PERF defined)

SebastianApel · 2023-04-15T20:56:20Z

@jon-chuang Since this PR is merged - would you be open to creating a new PR with your idea? That would make it easier to discuss your suggestion and/or code changes.

SebastianApel added 2 commits March 31, 2023 16:39

Initial version of q4_0 matrix multiplication benchmark

ed5f4fe

Merge branch 'ggerganov:master' into master

2877517

SebastianApel changed the title ~~Benchmark test case for q4_0 matrix multiplication benchmark~~ Benchmark test case for q4_0 matrix multiplication Mar 31, 2023

Bugfix: Added dependency to ggml.o to benchmark

3b7dcc0

howard0su reviewed Mar 31, 2023

View reviewed changes